Re: [PATCH] New default role allowing to change per-role/database settings

2021-07-22 Thread Michael Banck
Hi,

Am Mittwoch, den 14.07.2021, 21:29 +0530 schrieb vignesh C:
> On Wed, Apr 7, 2021 at 5:23 PM Michael Banck  
> wrote:
> > Hi,
> > 
> > Am Dienstag, den 06.04.2021, 15:37 +0200 schrieb Michael Banck:
> > > Am Montag, den 05.04.2021, 14:33 -0400 schrieb Stephen Frost:
> > > > Should drop the 'DEFAULT_' to match the others since the rename to
> > > > 'predefined' roles went in.
> > > 
> > > Right, will send a rebased patch ASAP.
> > 
> > Here's a rebased version that also removes the ALTER DATABASE SET
> > capability from this new predefined role and adds some documentation.
> 
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Thanks for letting me know, I've attached a rebased v4 of this patch, no
other changes.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
E-Mail: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer, Geoff Richardson, Peter 
Lilley

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
From 144306ed2835d33b2c1b2f5c7b5c247c41d80df0 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Tue, 15 Dec 2020 20:53:59 +0100
Subject: [PATCH v4] Add new PGC_ADMINSET guc context and
 pg_change_role_settings predefined role.

This commit introduces `administrator' as a middle ground between `superuser'
and `user' for guc contexts.

Those settings initially include all previously `superuser'-context settings
from the 'Reporting and Logging' ('What/When to Log' but not 'Where to Log')
and both 'Statistics' categories. Further settings could be added in follow-up
commits.

Also, a new predefined role pg_change_role_settings is introduced.  This allows
administrators (that are not superusers) that have been granted this role to
change the above PGC_ADMINSET settings on a per-role basis like "ALTER ROLE
[...] SET log_statement".

The rationale is to allow certain settings that affect logging to be changed
in setups where the DBA has access to the database log, but is not a full
superuser.
---
 doc/src/sgml/ref/alter_role.sgml  | 11 +++---
 doc/src/sgml/user-manag.sgml  |  7 
 src/backend/commands/user.c   |  7 +++-
 src/backend/utils/misc/guc.c  | 61 +++
 src/include/catalog/pg_authid.dat |  5 +++
 src/include/utils/guc.h   |  1 +
 6 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml
index 5aa5648ae7..d332c29c71 100644
--- a/doc/src/sgml/ref/alter_role.sgml
+++ b/doc/src/sgml/ref/alter_role.sgml
@@ -115,11 +115,12 @@ ALTER ROLE { role_specification | A
 
   
Superusers can change anyone's session defaults. Roles having
-   CREATEROLE privilege can change defaults for non-superuser
-   roles. Ordinary roles can only set defaults for themselves.
-   Certain configuration variables cannot be set this way, or can only be
-   set if a superuser issues the command.  Only superusers can change a setting
-   for all roles in all databases.
+   CREATEROLE privilege or which are a member of the
+   pg_change_role_settings predefined role can change
+   defaults for non-superuser roles. Ordinary roles can only set defaults for
+   themselves.  Certain configuration variables cannot be set this way, or can
+   only be set if a superuser issues the command.  Only superusers can change a
+   setting for all roles in all databases.
   
  
 
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index fe0bdb7599..48a7aa8cc2 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -541,6 +541,13 @@ DROP ROLE doomed_role;
Read all configuration variables, even those normally visible only to
superusers.
   
+  
+   pg_change_role_settings
+   Allow changing role-based settings for all non-superuser roles
+   with ALTER ROLE rolename SET
+   varname TO
+   value.
+  
   
pg_read_all_stats
Read all pg_stat_* views and use various statistics related extensions,
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index aa69821be4..652f8aca21 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -876,7 +876,8 @@ AlterRoleSet(AlterRoleSetStmt *stmt)
 
 		/*
 		 * To mess with a superuser you gotta be superuser; else you need
-		 * createrole, or just want to change your own settings
+		 * createrole, the pg_change_role_settings default role, or just want
+		 * to change your own settings.
 		 */
 		if (roleform->rolsuper)
 		{
@@ -887,7 +888,9 @@ AlterRoleSet(AlterRoleSetStmt *stmt)
 		}
 		else
 		{
-			if (!have_createrole_privilege() && roleid != GetUserId())
+			if (!have_createrole_privilege() &&
+roleid != GetUserId() &&
+

RE: [PATCH] Add tab-complete for backslash commands

2021-07-22 Thread tanghy.f...@fujitsu.com
On Thursday, July 15, 2021 6:46 PM, tanghy.f...@fujitsu.com 
 wrote:
>Attached a patch to improve the tab completion for backslash commands.
>I think it's common for some people(I'm one of them) to use full-name commands 
>than abbreviation.
>So it's more convenient if we can add the full-name backslash commands in the 
>tab-complete.c.

Add above patch in the commit fest as follows:

https://commitfest.postgresql.org/34/3268/

Regards,
Tang





Re: logical replication empty transactions

2021-07-22 Thread Ajin Cherian
On Fri, Jul 23, 2021 at 10:13 AM Peter Smith  wrote:
>
> I have reviewed the v9 patch and my feedback comments are below:
>
> //
>
> 1. Apply v9 gave multiple whitespace warnings

Fixed.

>
> --
>
> 2. Commit comment - wording
>
> pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
> for transactions which were skipped.
>
> =>
>
> Is that correct? Or did you mean to say:
>
> AFTER
> pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
> for transactions that are empty.
>
> --

Updated.

>
> 3. src/backend/replication/pgoutput/pgoutput.c - typo
>
> + /*
> + * If the BEGIN PREPARE was not yet sent, then it means there were no
> + * relevant changes encountered, so we can skip the COMMIT PREPARED
> + * messsage too.
> + */
>
> Typo: "messsage" --> "message"
>
> (NOTE this same typo is in 2 places)
>
Fixed.

I have made these changes in v10 of the patch.

regards,
Ajin Cherian
Fujitsu Australia




Re: logical replication empty transactions

2021-07-22 Thread Ajin Cherian
On Fri, Jul 23, 2021 at 10:26 AM Greg Nancarrow  wrote:
>
> On Thu, Jul 22, 2021 at 11:37 PM Ajin Cherian  wrote:
> >
>
> I have some minor comments on the v9 patch:
>
> (1) Several whitespace warnings on patch application
>

Fixed.

> (2) Suggested patch comment change:
>
> BEFORE:
> The current logical replication behaviour is to send every transaction to
> subscriber even though the transaction is empty (because it does not
> AFTER:
> The current logical replication behaviour is to send every transaction to
> subscriber even though the transaction might be empty (because it does not
>
Changed accordingly.

> (3) Comment needed for added struct defn:
>
> typedef struct PGOutputTxnData
>

Added.

> (4) Improve comment.
>
> Can you add a comma (or add words) in the below sentence, so we know
> how to read it?
>

Updated.

regards,
Ajin Cherian
Fujitsu Australia


v10-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data


RE: [PATCH] support tab-completion for single quote input with equal sign

2021-07-22 Thread tanghy.f...@fujitsu.com
On Thursday, July 22, 2021 1:05 PM, tanghy.f...@fujitsu.com 
 wrote
>I found a problem when using tab-completion as follows:
>
>CREATE SUBSCRIPTION my_subscription 
>CONNECTION 'host=localhost port=5432 dbname=postgres' [TAB]
>
>The word 'PUBLICATION' couldn't be auto completed as expected.

Added above patch in commit fest as follows:

https://commitfest.postgresql.org/34/3267/

Regards,
Tang




Re: pg_amcheck: Fix block number parsing on command line

2021-07-22 Thread Michael Paquier
On Thu, Jul 22, 2021 at 04:56:56PM +0200, Peter Eisentraut wrote:
> It seems to me that when using the pg_amcheck --startblock and --endblock
> options on platforms where sizeof(long) == 4, you cannot specify higher
> block numbers (unless you do tricks with negative numbers).  The attached
> patch should fix this by using strtoul() instead of strtol().  I also
> tightened up the number scanning a bit in other ways, similar to the code in
> other frontend utilities.  I know some people have been working on
> tightening all this up.  Please check that it's up to speed.

Yeah, some work is happening to tighten all that.  Saying that, the
first round of review I did for the option parsing is not involving
option types other than int32, so the block options of pg_amcheck are
not changing for now, and what you are suggesting here is fine for the
moment for 14~.  Still, note that I am planning to change that on HEAD
with an option parsing API for int64 that could be used for block
numbers, eliminating for example the 4 translatable strings for the
code being changed here.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG]Update Toast data failure in logical replication

2021-07-22 Thread Amit Kapila
On Thu, Jul 22, 2021 at 8:02 PM Dilip Kumar  wrote:
>
> On Thu, Jul 22, 2021 at 4:11 PM Amit Kapila  wrote:
> >
> > On Thu, Jun 3, 2021 at 5:15 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh 
> > > >  wrote:
> > > > >
> > > > > On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar  
> > > > > wrote:
> > > > > >
> > > > > > Yes, you are right.  I will change it in the next version, along 
> > > > > > with
> > > > > > the test case.
> > > > > >
> > > > > +/*
> > > > > + * if the key hasn't changed and we're only logging the key, 
> > > > > we're done.
> > > > > + * But if tuple has external data then we might have to detoast 
> > > > > the key.
> > > > > + */
> > > > > This doesn't really mention why we need to detoast the key even when
> > > > > the key remains the same. I guess we can add some more details here.
> > > >
> > > > Noted, let's see what others have to say about fixing this, then I
> > > > will fix this along with one other pending comment and I will also add
> > > > the test case.  Thanks for looking into this.
> > >
> > > I have fixed all the pending issues, I see there is already a test
> > > case for this so I have changed the output for that.
> > >
> >
> > IIUC, this issue occurs because we don't log the actual key value for
> > unchanged toast key. It is neither logged as part of old_key_tuple nor
> > for new tuple due to which we are not able to find it in the
> > apply-side when we searched it via FindReplTupleInLocalRel. Now, I
> > think it will work if we LOG the entire unchanged toasted value as you
> > have done in the patch but can we explore some other way to fix it. In
> > the subscriber-side, can we detect that the key column has toasted
> > value in the new tuple and try to first fetch it and then do the index
> > search for the fetched toasted value? I am not sure about the
> > feasibility of this but wanted to see if we can someway avoid logging
> > unchanged toasted key value as that might save us from additional WAL.
>
> Yeah if we can do this then it will be a better approach, I think as
> you mentioned we can avoid logging unchanged toast key data.  I will
> investigate this next week and update the thread.
>

Okay, thanks. I think one point we need to consider here is that on
the subscriber side, we use dirtysnapshot to search the key, so we
need to ensure that we don't fetch the wrong data. I am not sure what
will happen when by the time we try to search the tuple in the
subscriber-side for the update, it has been removed and re-inserted
with the same values by the user. Do we find the newly inserted tuple
and update it? If so, can it also happen even if logged the unchanged
old_key_tuple as the patch is doing currently?

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-07-22 Thread Amit Kapila
On Fri, Jul 23, 2021 at 8:29 AM Amit Kapila  wrote:
>
> On Thu, Jul 22, 2021 at 8:06 PM Dilip Kumar  wrote:
> >
> > On Thu, Jul 22, 2021 at 5:15 PM Amit Kapila  wrote:
> > >
> > > On Tue, Jul 20, 2021 at 4:33 PM Dilip Kumar  wrote:
> > > >
> > > > On Tue, Jul 20, 2021 at 3:43 PM Tomas Vondra
> > > >  wrote:
> > > > >
> > > > > Do we log the TOAST-ed values that were not updated?
> > > >
> > > > No, we don't, I have submitted a patch sometime back to fix that [1]
> > > >
> > >
> > > That patch seems to log WAL for key unchanged columns. What about if
> > > unchanged non-key columns? Do they get logged as part of the new tuple
> > > or is there some other way we can get those? If not, then we need to
> > > probably think of restricting filter clause in some way.
> >
> > But what sort of restrictions? I mean we can not put based on data
> > type right that will be too restrictive,
> >
>
> Yeah, data type restriction sounds too restrictive and unless the data
> is toasted, the data will be anyway available. I think such kind of
> restriction should be the last resort but let's try to see if we can
> do something better.
>
> > other option is only to allow
> > replica identity keys columns in the filter condition?
> >
>
> Yes, that is what I had in mind because if key column(s) is changed
> then we will have data for both old and new tuples. But if it is not
> changed then we will have it probably for the old tuple unless we
> decide to fix the bug you mentioned in a different way in which case
> we might either need to log it for the purpose of this feature (but
> that will be any way for HEAD) or need to come up with some other
> solution here. I think we can't even fetch such columns data during
> decoding because we have catalog-only historic snapshots here. Do you
> have any better ideas?
>

BTW, I wonder how pglogical can handle this because if these unchanged
toasted values are not logged in WAL for the new tuple then how the
comparison for such columns will work? Either they are forcing WAL in
some way or don't allow WHERE clause on such columns or maybe they
have dealt with it in some other way unless they are unaware of this
problem.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-07-22 Thread Amit Kapila
On Thu, Jul 22, 2021 at 8:06 PM Dilip Kumar  wrote:
>
> On Thu, Jul 22, 2021 at 5:15 PM Amit Kapila  wrote:
> >
> > On Tue, Jul 20, 2021 at 4:33 PM Dilip Kumar  wrote:
> > >
> > > On Tue, Jul 20, 2021 at 3:43 PM Tomas Vondra
> > >  wrote:
> > > >
> > > > Do we log the TOAST-ed values that were not updated?
> > >
> > > No, we don't, I have submitted a patch sometime back to fix that [1]
> > >
> >
> > That patch seems to log WAL for key unchanged columns. What about if
> > unchanged non-key columns? Do they get logged as part of the new tuple
> > or is there some other way we can get those? If not, then we need to
> > probably think of restricting filter clause in some way.
>
> But what sort of restrictions? I mean we can not put based on data
> type right that will be too restrictive,
>

Yeah, data type restriction sounds too restrictive and unless the data
is toasted, the data will be anyway available. I think such kind of
restriction should be the last resort but let's try to see if we can
do something better.

> other option is only to allow
> replica identity keys columns in the filter condition?
>

Yes, that is what I had in mind because if key column(s) is changed
then we will have data for both old and new tuples. But if it is not
changed then we will have it probably for the old tuple unless we
decide to fix the bug you mentioned in a different way in which case
we might either need to log it for the purpose of this feature (but
that will be any way for HEAD) or need to come up with some other
solution here. I think we can't even fetch such columns data during
decoding because we have catalog-only historic snapshots here. Do you
have any better ideas?

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-07-22 Thread Greg Nancarrow
On Fri, Jul 23, 2021 at 10:56 AM tanghy.f...@fujitsu.com
 wrote:
>
>
> After applying your V13 patch. I noticed that if I specify duplicate schema 
> names when using "ALTER PUBLICATION ... SET SCHEMA ...", I would get the 
> following error message:
>
> postgres=# ALTER PUBLICATION pub1 SET SCHEMA s1,s1;
> ERROR:  duplicate key value violates unique constraint 
> "pg_publication_sch_psnspcid_pspubid_index"
> DETAIL:  Key (psnspcid, pspubid)=(16406, 16405) already exists.
>

That definitely seems to be a bug, since "ALTER PUBLICATION ... SET
TABLE ..." ignores duplicates and there is no ERROR.
"CREATE PUBLICATION ... SET SCHEMA s1, s1;" and "ALTER PUBLICATION ...
ADD SCHEMA s1, s1;"   also give the same kind of error.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Slightly improve initdb --sync-only option's help message

2021-07-22 Thread Justin Pryzby
On Thu, Jul 22, 2021 at 10:32:18PM +, Bossart, Nathan wrote:
> On 7/6/21, 7:02 PM, "Gurjeet Singh"  wrote:
> > I think it would be helpful if the help message was slightly more
> > descriptive. Some options:
> >
> > Used in patch:
> >  only sync data directory; does not modify any data
> >
> > To match the wording of --sync-only option:
> > write contents of data directory to disk; helpful after --no-sync option
> >
> > Clearly specify the system operation used for the option
> > perform fsync on data directory; helpful after --no-sync option
> 
> I think the help message should say exactly what the option does and
> should avoid saying what it does not do or how it may be useful.  I
> would suggest the following to match the initdb docs [0]:
> 
>   -S, --sync-only   safely write all database files to disk 
> and exit
> 
> IMO the note about the option being helpful after using the --no-sync
> option would fit better in the docs, but I'm struggling to think of a
> use case for using --no-sync and then calling initdb again with
> --sync-only.  Why wouldn't you just leave out --no-sync the first
> time?

It's to allow safely running bulk loading with fsync=off - if the bulk load
fails, you can wipe out the partially-loaded cluster and start over.
But then transitioning to a durable state requires not just setting fsync=on,
which enables future fsync calls.  It also requires syncing all dirty buffers.

doc/src/sgml/config.sgml-   
doc/src/sgml/config.sgml-For reliable recovery when changing 
fsync
doc/src/sgml/config.sgml-off to on, it is necessary to force all 
modified buffers in the
doc/src/sgml/config.sgml-kernel to durable storage.  This can be done 
while the cluster
doc/src/sgml/config.sgml-is shutdown or while fsync 
is on by running initdb
doc/src/sgml/config.sgml:--sync-only, running 
sync, unmounting the
doc/src/sgml/config.sgml-file system, or rebooting the server.
doc/src/sgml/config.sgml-   

-- 
Justin




RE: Added schema level support for publication.

2021-07-22 Thread tanghy.f...@fujitsu.com
On Thursday, July 22, 2021 1:30 AM vignesh C  wrote:
> 
> Thanks for reporting this issue, this issue is fixed in the attached v13 
> patch.
> I have changed relation name pg_publication_schema to
> pg_publication_sch so that the names are in similar lines with
> pg_publication_rel relation and similar changes were done for variable
> names too.

Thanks for your fixing. The issue is fixed as you said.

After applying your V13 patch. I noticed that if I specify duplicate schema 
names when using "ALTER PUBLICATION ... SET SCHEMA ...", I would get the 
following error message:

postgres=# ALTER PUBLICATION pub1 SET SCHEMA s1,s1;
ERROR:  duplicate key value violates unique constraint 
"pg_publication_sch_psnspcid_pspubid_index"
DETAIL:  Key (psnspcid, pspubid)=(16406, 16405) already exists.

I think the error message is pretty hard to understand. Maybe we can do sth to 
improve this scenario.

Here is two proposal:
1. Don't report error message, just add some code to make the above command to 
be executed successfully, 
   just like "ALTER PUBLICATION ... SET TABLE ..." as follolws:

postgres=# ALTER PUBLICATION pub2 SET TABLE t1,t1;
ALTER PUBLICATION
postgres=# \dRp+ pub2
   Publication pub2
  Owner   | All tables | Inserts | Updates | Deletes | Truncates | Via root | 
Pubtype
--++-+-+-+---+--+-
 postgres | f  | t   | t   | t   | t | f| t
Tables:
"public.t1"

2. Report a easily understandable message like: Schema s1 specified more than 
once

Thoughts?

Regards
Tang


Re: logical replication empty transactions

2021-07-22 Thread Greg Nancarrow
On Thu, Jul 22, 2021 at 11:37 PM Ajin Cherian  wrote:
>

I have some minor comments on the v9 patch:

(1) Several whitespace warnings on patch application

(2) Suggested patch comment change:

BEFORE:
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction is empty (because it does not
AFTER:
The current logical replication behaviour is to send every transaction to
subscriber even though the transaction might be empty (because it does not

(3) Comment needed for added struct defn:

typedef struct PGOutputTxnData

(4) Improve comment.

Can you add a comma (or add words) in the below sentence, so we know
how to read it?

+ /*
+ * Delegate to assign the begin sent flag as false same as for the
+ * BEGIN message.
+ */


Regards,
Greg Nancarrow
Fujitsu Australia




Re: logical replication empty transactions

2021-07-22 Thread Peter Smith
I have reviewed the v9 patch and my feedback comments are below:

//

1. Apply v9 gave multiple whitespace warnings

$ git apply v9-0001-Skip-empty-transactions-for-logical-replication.patch
v9-0001-Skip-empty-transactions-for-logical-replication.patch:479:
indent with spaces.
* If the BEGIN PREPARE was not yet sent, then it means there were no
v9-0001-Skip-empty-transactions-for-logical-replication.patch:480:
indent with spaces.
* relevant changes encountered, so we can skip the ROLLBACK PREPARED
v9-0001-Skip-empty-transactions-for-logical-replication.patch:481:
indent with spaces.
* messsage too.
v9-0001-Skip-empty-transactions-for-logical-replication.patch:482:
indent with spaces.
*/
warning: 4 lines add whitespace errors.

--

2. Commit comment - wording

pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions which were skipped.

=>

Is that correct? Or did you mean to say:

AFTER
pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages
for transactions that are empty.

--

3. src/backend/replication/pgoutput/pgoutput.c - typo

+ /*
+ * If the BEGIN PREPARE was not yet sent, then it means there were no
+ * relevant changes encountered, so we can skip the COMMIT PREPARED
+ * messsage too.
+ */

Typo: "messsage" --> "message"

(NOTE this same typo is in 2 places)

--

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Rename of triggers for partitioned tables

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-22, Alvaro Herrera wrote:

> I pushed this patch with some minor corrections (mostly improved the
> message wording.)

Hmm, the sorting fails for Czech or something like that.  Will fix ...

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: Avoid repeated PQfnumber() in pg_dump

2021-07-22 Thread Bossart, Nathan
On 7/15/21, 12:48 PM, "Daniel Gustafsson"  wrote:
> While unlikely to be common, very wide tables aren’t unheard of.  Either way, 
> I
> think it has merit to pull out the PQfnumber before the loop even if it’s a
> wash performance wise for many users, as it’s a pattern used elsewhere in
> pg_dump.

+1

The patch looks good to me.  I am marking it as ready-for-committer.

Nathan



Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

2021-07-22 Thread Bossart, Nathan
On 7/12/21, 9:00 AM, "Bharath Rupireddy" 
 wrote:
> As suggested in [1], starting a new thread for discussing $subject
> separately. {pre, post}_auth_delay waiting  logic currently uses
> pg_usleep which can't detect postmaster death. So, there are chances
> that some of the backends still stay in the system even when a
> postmaster crashes (for whatever reasons it may be). Please have a
> look at the attached patch that does $subject. I pulled out some of
> the comments from the other thread related to the $subject, [2], [3],
> [4], [5].

+ 
+  PostAuthDelay
+  Waiting on connection startup after authentication to allow attach
+  from a debugger.
+ 
+ 
+  PreAuthDelay
+  Waiting on connection startup before authentication to allow
+  attach from a debugger.
+ 

I would suggest changing "attach from a debugger" to "attaching with a
debugger."

if (PreAuthDelay > 0)
-   pg_usleep(PreAuthDelay * 100L);
+   {
+   /*
+* Do not use WL_LATCH_SET during backend initialization 
because the
+* MyLatch may point to shared latch later.
+*/
+   (void) WaitLatch(MyLatch,
+WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,
+PreAuthDelay * 1000L,
+WAIT_EVENT_PRE_AUTH_DELAY);
+   }

IIUC you want to use the same set of flags as PostAuthDelay for
PreAuthDelay, but the stated reason in this comment for leaving out
WL_LATCH_SET suggests otherwise.  It's not clear to me why the latch
possibly pointing to a shared latch in the future is an issue.  Should
this instead say that we leave out WL_LATCH_SET for consistency with
PostAuthDelay?

Nathan



Re: Rename of triggers for partitioned tables

2021-07-22 Thread Alvaro Herrera
I pushed this patch with some minor corrections (mostly improved the
message wording.)

Thanks

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)




Re: Slightly improve initdb --sync-only option's help message

2021-07-22 Thread Bossart, Nathan
On 7/6/21, 7:02 PM, "Gurjeet Singh"  wrote:
> I think it would be helpful if the help message was slightly more
> descriptive. Some options:
>
> Used in patch:
>  only sync data directory; does not modify any data
>
> To match the wording of --sync-only option:
> write contents of data directory to disk; helpful after --no-sync option
>
> Clearly specify the system operation used for the option
> perform fsync on data directory; helpful after --no-sync option

I think the help message should say exactly what the option does and
should avoid saying what it does not do or how it may be useful.  I
would suggest the following to match the initdb docs [0]:

  -S, --sync-only   safely write all database files to disk and 
exit

IMO the note about the option being helpful after using the --no-sync
option would fit better in the docs, but I'm struggling to think of a
use case for using --no-sync and then calling initdb again with
--sync-only.  Why wouldn't you just leave out --no-sync the first
time?

Nathan

[0] https://www.postgresql.org/docs/devel/app-initdb.html



Re: Have I found an interval arithmetic bug?

2021-07-22 Thread Zhihong Yu
On Thu, Jul 22, 2021 at 2:59 PM Zhihong Yu  wrote:

>
>
> On Wed, Jul 21, 2021 at 6:43 PM Bruce Momjian  wrote:
>
>> On Wed, Jul 21, 2021 at 06:39:26PM -0700, Bryn Llewellyn wrote:
>> > Your statement
>> >
>> >
>> > “months-to-days conversion is almost always an approximation, while
>> the
>> > days to seconds conversion is almost always accurate.”
>> >
>> >
>> > is misleading. Any conversion like these (and also the “spill up”
>> conversions
>> > that the justify_hours(), justify_days(), and justify_interval()
>> built-in
>> > functions bring) are semantically dangerous because of the different
>> rules for
>> > adding a pure months, a pure days, or a pure seconds interval to a
>> timestamptz
>> > value.
>>
>> We are trying to get the most reasonable output for fractional values
>> --- I stand by my statements.
>>
>> > Unless you avoid mixed interval values, then it’s so hard (even though
>> it is
>> > possible) to predict the outcomes of interval arithmetic. Rather, all
>> you get
>> > is emergent behavior that I fail to see can be relied upon in
>> deliberately
>> > designed application code. Here’s a telling example:
>>
>> The point is that we will get unusual values, so we should do the best
>> we can.
>>
>> --
>>   Bruce Momjian  https://momjian.us
>>   EDB  https://enterprisedb.com
>>
>>   If only the physical world exists, free will is an illusion.
>>
>> Hi,
>
> -   tm->tm_mon += (fval * MONTHS_PER_YEAR);
> +   tm->tm_mon += rint(fval * MONTHS_PER_YEAR);
>
> Should the handling for year use the same check as that for month ?
>
> -   AdjustFractDays(fval, tm, fsec, DAYS_PER_MONTH);
> +   /* round to a full month? */
> +   if (rint(fval * DAYS_PER_MONTH) == DAYS_PER_MONTH)
>
> Cheers
>
Hi,
I guess the reason for current patch was that year to months conversion is
accurate.
On the new test:

+SELECT INTERVAL '1.16 months 01:00:00' AS "One mon 5 days one hour";

0.16 * 31 = 4.96 < 5

I wonder why 5 days were chosen in the test output.


Re: Have I found an interval arithmetic bug?

2021-07-22 Thread Zhihong Yu
On Wed, Jul 21, 2021 at 6:43 PM Bruce Momjian  wrote:

> On Wed, Jul 21, 2021 at 06:39:26PM -0700, Bryn Llewellyn wrote:
> > Your statement
> >
> >
> > “months-to-days conversion is almost always an approximation, while
> the
> > days to seconds conversion is almost always accurate.”
> >
> >
> > is misleading. Any conversion like these (and also the “spill up”
> conversions
> > that the justify_hours(), justify_days(), and justify_interval() built-in
> > functions bring) are semantically dangerous because of the different
> rules for
> > adding a pure months, a pure days, or a pure seconds interval to a
> timestamptz
> > value.
>
> We are trying to get the most reasonable output for fractional values
> --- I stand by my statements.
>
> > Unless you avoid mixed interval values, then it’s so hard (even though
> it is
> > possible) to predict the outcomes of interval arithmetic. Rather, all
> you get
> > is emergent behavior that I fail to see can be relied upon in
> deliberately
> > designed application code. Here’s a telling example:
>
> The point is that we will get unusual values, so we should do the best
> we can.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
> Hi,

-   tm->tm_mon += (fval * MONTHS_PER_YEAR);
+   tm->tm_mon += rint(fval * MONTHS_PER_YEAR);

Should the handling for year use the same check as that for month ?

-   AdjustFractDays(fval, tm, fsec, DAYS_PER_MONTH);
+   /* round to a full month? */
+   if (rint(fval * DAYS_PER_MONTH) == DAYS_PER_MONTH)

Cheers


Re: Corrected documentation of data type for the logical replication message formats.

2021-07-22 Thread Peter Smith
I think the patch maybe is not quite correct for all the flags.

For example,

@@ -7607,44 +7615,44 @@ are available since protocol version 3.
 
 Int8
 
-Flags; currently unused (must be 0).
+Flags (uint8); currently unused.
 
 

AFAIK, even though the flags are "unused", the code still insists that
most (or all? Please check the code) of these flag values MUST be 0,
so I think that this zero value requirement ought to be indicated in
the docs using the "Int8(0)" convention [1]. For example,

BEFORE
Int8
Flags (uint8); currently unused.

AFTER
Int8(0)
Flags (uint8); currently unused.

--
[1] https://www.postgresql.org/docs/devel/protocol-message-types.html

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread John Naylor
On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
> Not related to negative interval - I created a PR for adding zero check
for stride https://github.com/postgres/postgres/pull/67 and after getting
it closed I stopped right there - 1 line check doesn't worth going through
the patching process I'm not familiar with.

Thanks for the pull request! I added tests and reworded the error message
slightly to match current style, and pushed.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Mark Dilger



> On Jul 22, 2021, at 11:21 AM, Robert Haas  wrote:
> 
> When we come to the
> third thing the patch includes in this category, creating and dropping
> event triggers, I *really* don't understand why that one is considered
> host security. That one isn't touching the filesystem even to the
> extent that the extension stuff is; it seems to me to be purely
> internal to the database. Yeah, OK, that could involve writing files
> because we make catalog entries, but so could any DDL. Now, maybe
> there's a theory of operation that you have in mind that makes this
> all make more sense the way you have it, but if so, it seems not to be
> spelled out anywhere in the patch itself or the commit message you
> wrote for it, so I'm in the dark.

I agree with the need to document the theory behind all this.  Event triggers 
are dangerous because they can trap a superuser into executing code they do not 
intend:

  create table super_special (big_darn_secret integer);
  revoke all privileges on super_special from public;
  insert into super_special values (42);
  -- imagine that "untrustworth_bob" is a member of as-yet unimplemented role
  -- pg_database_security, and has the ability to create event triggers; to 
simulate
  -- that, we'll put bob into superuser temporarily while the event trigger is
  -- created, then remove superuser.
  create role untrustworthy_bob superuser;
  set session authorization untrustworthy_bob;
  create function update_super_special_big_darn_secret() returns event_trigger 
as $$
  begin
-- note that non-superusers should draw an error if they try this
update super_special set big_darn_secret = big_darn_secret + 1;

-- note that non-pg_host_security roles should draw an error if they try 
this
perform pg_rotate_logfile();
  end;
  $$ language plpgsql;
  create event trigger secret_sauce on sql_drop
execute procedure update_super_special_big_darn_secret();
  reset session authorization;
  alter role untrustworthy_bob nosuperuser;
  set session authorization untrustworthy_bob;
  update super_special set big_darn_secret = big_darn_secret + 1;
  ERROR:  permission denied for table super_special
  select pg_rotate_logfile();
  ERROR:  permission denied for function pg_rotate_logfile
  reset session authorization;
  select * from super_special;
   big_darn_secret
  -
42
  (1 row)

  create table foo_tmp (t integer);
  drop table foo_tmp;
  WARNING:  rotation not possible because log collection not active
  select * from super_special;
   big_darn_secret
  -
43
  (1 row)

When the superuser dropped table foo_tmp, pg_rotate_logfile() got called, as 
did an update of table super_special.  Any other function could have been 
called from there instead.  That's a big deal.  If creating event triggers is 
delegated to nonsuperuser members of pg_database_security, I think it means 
that pg_database_security has a privilege escalation path to become superuser.  
Since pg_host_security already has such an escalation path, it makes more sense 
to me to use this role for event trigger creation.  The argument for dropping 
event triggers is less clear, but it seems that event triggers may be used to 
implement an auditing system, and we wouldn't want pg_database_security to be 
enough privilege to circumvent the auditing system.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Avoiding data loss with synchronous replication

2021-07-22 Thread Bossart, Nathan
Hi hackers,

As previously discussed [0], canceling synchronous replication waits
can have the unfortunate side effect of making transactions visible on
a primary server before they are replicated.  A failover at this time
would cause such transactions to be lost.  The proposed solution in
the previous thread [0] involved blocking such cancellations, but many
had concerns about that approach (e.g., backends could be
unresponsive, server restarts were still affected by this problem).  I
would like to propose something more like what Fujii-san suggested [1]
that would avoid blocking cancellations while still preventing data
loss.  I believe this is a key missing piece of the synchronous
replication functionality in PostgreSQL.

AFAICT there are a variety of ways that the aforementioned problem may
occur:
  1. Server restarts: As noted in the docs [2], "waiting transactions
 will be marked fully committed once the primary database
 recovers."  I think there are a few options for handling this,
 but the simplest would be to simply failover anytime the primary
 server shut down.  My proposal may offer other ways of helping
 with this.
  2. Backend crashes: If a backend crashes, the postmaster process
 will restart everything, leading to the same problem described in
 1.  However, this behavior can be prevented with the
 restart_after_crash parameter [3].
  3. Client disconnections: During waits for synchronous replication,
 interrupt processing is turned off, so disconnected clients
 actually don't seem to cause a problem.  The server will still
 wait for synchronous replication to complete prior to making the
 transaction visible on the primary.
  4. Query cancellations and backend terminations: This appears to be
 the only gap where there is no way to avoid potential data loss,
 and it is the main target of my proposal.

Instead of blocking query cancellations and backend terminations, I
think we should allow them to proceed, but we should keep the
transactions marked in-progress so they do not yet become visible to
sessions on the primary.  Once replication has caught up to the
the necessary point, the transactions can be marked completed, and
they would finally become visible.

The main advantages of this approach are 1) it still allows for
canceling waits for synchronous replication and 2) it provides an
opportunity to view and manage waits for synchronous replication
outside of the standard cancellation/termination functionality.  The
tooling for 2 could even allow a session to begin waiting for
synchronous replication again if it "inadvertently interrupted a
replication wait..." [4].  I think the main disadvantage of this
approach is that transactions committed by a session may not be
immediately visible to the session when the command returns after
canceling the wait for synchronous replication.  Instead, the
transactions would become visible in the future once the change is
replicated.  This may cause problems for an application if it doesn't
handle this scenario carefully.

What are folks' opinions on this idea?  Is this something that is
worth prototyping?

Nathan

[0] 
https://www.postgresql.org/message-id/flat/c1f7905e-5db2-497d-abcc-e14d4dee5...@yandex-team.ru
[1] 
https://www.postgresql.org/message-id/4f8d54c9-6f18-23d5-c4de-9d6656d3a408%40oss.nttdata.com
[2] 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
[3] 
https://www.postgresql.org/docs/devel/runtime-config-error-handling.html#GUC-RESTART-AFTER-CRASH
[4] 
https://www.postgresql.org/message-id/CA%2BTgmoZpwBEyPDZixeHN9ZeNJJjd3EBEQ8nJPaRAsVexhssfNg%40mail.gmail.com



Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-22 Thread Michael Paquier
On Thu, Jul 22, 2021 at 09:42:00AM -0400, Alvaro Herrera wrote:
> May I suggest for the second sentence something like "If the parsing is
> successful, returns true and stores the result in *result if that's
> given; if parsing fails, returns false"

Sounds fine to me.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: Autovacuum on partitioned table (autoanalyze)

2021-07-22 Thread Andres Freund
Hi,

On 2021-04-08 01:20:14 -0400, Alvaro Herrera wrote:
> On 2021-Apr-07, Alvaro Herrera wrote:
>
> > OK, I bit the bullet and re-did the logic in the way I had proposed
> > earlier in the thread: do the propagation on the collector's side, by
> > sending only the list of ancestors: the collector can read the tuple
> > change count by itself, to add it to each ancestor.  This seems less
> > wasteful.  Attached is v16 which does it that way and seems to work
> > nicely under my testing.
>
> Pushed with this approach.  Thanks for persisting with this.

I'm looking at this in the context of rebasing & polishing the shared
memory stats patch.

I have a few questions / concerns:

1) Somehow it seems like a violation to do stuff like
   get_partition_ancestors() in pgstat.c. It's nothing I can't live with, but
   it feels a bit off. Would likely not be too hard to address, e.g. by just
   putting some of pgstat_report_anl_ancestors in partition.c instead.

2) Why does it make sense that autovacuum sends a stats message for every
   partition in the system that had any chances since the last autovacuum
   cycle? On a database with a good number of objects / a short naptime we'll
   often end up sending messages for the same set of tables from separate
   workers, because they don't yet see the concurrent
   tabentry->changes_since_analyze_reported.

3) What is the goal of the autovac_refresh_stats() after the loop doing
   pgstat_report_anl_ancestors()? I think it'll be common that the stats
   collector hasn't even processed the incoming messages by that point, not to
   speak of actually having written out a new stats file. If it took less than
   10ms (PGSTAT_RETRY_DELAY) to get to autovac_refresh_stats(),
   backend_read_statsfile() will not wait for a new stats file to be written
   out, and we'll just re-read the state we previously did.

   It's pretty expensive to re-read the stats file in some workloads, so I'm a
   bit concerned that we end up significantly increasing the amount of stats
   updates/reads, without actually gaining anything reliable?

4) In the shared mem stats patch I went to a fair bit of trouble to try to get
   rid of pgstat_vacuum_stat() (which scales extremely poorly to larger
   systems). For that to work pending stats can only be "staged" while holding
   a lock on a relation that prevents the relation from being concurrently
   dropped (pending stats increment a refcount for the shared stats object,
   which ensures that we don't loose track of the fact that a stats object has
   been dropped, even when stats only get submitted later).

   I'm not yet clear on how to make this work for
   pgstat_report_anl_ancestors() - but I probably can find a way. But it does
   feel a bit off to issue stats stuff for tables we're not sure still exist.


I'll go and read through the thread, but my first thought is that having a
hashtable in do_autovacuum() that contains stats for partitioned tables would
be a good bit more efficient than the current approach? We already have a
hashtable for each toast table, compared to that having a hashtable for each
partitioned table doesn't seem like it'd be a problem?

With a small bit of extra work that could even avoid the need for the
additional pass through pg_class. Do the partitioned table data-gathering as
part of the "collect main tables to vacuum" pass, and then do one of

a) do the perform-analyze decision purely off the contents of that
   partioned-table hash
b) fetch the RELOID syscache entry by oid and then decide based on that
c) handle partioned tableas as part of the "check TOAST tables" pass - it's
   not like we gain a meaningful amount of efficiency by using a ScanKey to
   filter for RELKIND_TOASTVALUE, given that there's no index, and that an
   index wouldn't commonly be useful given the percentage of toast tables in
   pg_class

Partitioning makes it a bigger issue that do_autovacuum() does multiple passes
through pg_class (as it makes scenarios in which pg_class is large more
common), so I don't think it's great that partitioning also increases the
number of passes through pg_class.

Greetings,

Andres Freund




Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread Bauyrzhan Sakhariyev
> No, the boundary is intentionally the earlier one:

I found that commit in GitHub, thanks for pointing it out.
When I test locally *origin_in_the_future *case I get different results for
positive and negative intervals (see queries #1 and #2 from above, they
have same timestamp, origin and interval magnitude, difference is only in
interval sign) - can it be that the version I downloaded from
https://www.enterprisedb.com/postgresql-early-experience doesn't include
commit with that improvement?

>  I wonder if we should just disallow negative intervals here.

I cannot imagine somebody using negative as a constant argument but users
can pass another column as a first argument date or some function(ts) - not
likely but possible. A line in docs about the leftmost point of interval as
start of the bin could be helpful.

Not related to negative interval - I created a PR for adding zero check for
stride https://github.com/postgres/postgres/pull/67 and after getting it
closed I stopped right there - 1 line check doesn't worth going through the
patching process I'm not familiar with.

>In the case of full units (1 minute, 1 hour, etc.), it gives the same
result as the analogous date_trunc call,
Was not obvious to me that we need to supply Monday origin to make
date_bin(1 week, ts) produce same result with date_trunc

Sorry for the verbose report and thanks for the nice function -  I know
it's not yet released, was just playing around with beta as I want to
align CrateDB
date_bin  with Postgresql

On Thu, Jul 22, 2021 at 7:28 PM John Naylor 
wrote:

>
> On Thu, Jul 22, 2021 at 12:24 PM Bauyrzhan Sakhariyev <
> baurzhansahar...@gmail.com> wrote:
> >
> > Is date_bin supposed to return the beginning of the bin?
>
> Thanks for testing! And yes.
>
> > And does the sign of an interval define the "direction" of the bin?
>
> No, the boundary is intentionally the earlier one:
>
> /*
>  * Make sure the returned timestamp is at the start of the bin, even if
>  * the origin is in the future.
>  */
> if (origin > timestamp && stride_usecs > 1)
> tm_delta -= stride_usecs;
>
> I wonder if we should just disallow negative intervals here.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>


Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Robert Haas
On Thu, Jul 22, 2021 at 3:00 PM Stephen Frost  wrote:
> I'm a bit on the fence about the documentation side...  I could be
> convinced either way, really, but I generally agree that it'd be good to
> pick one and be consistent.  I don't think the places where we do/don't
> mention it were done for any particular reason.
>
> > So I don't know. At the very least I think we should not do it as an
> > "or" in the code; what we want to do in comments and documentation I'm
> > less sure.
>
> Agreed.

Thanks for weighing in.

> > I also tend to think that functions like pg_read_file() ought to come
> > with execute permission pre-granted, with grant option, to
> > pg_host_security, and perhaps similarly for adminpack.
>
> When it comes to these- we already have pg_read_server_files and
> pg_write_server_files, so I'm not sure I see why it'd make sense to have
> another thing that grants filesystem access like this..?

It's awkward. I think that we can't afford to create a separate
predefined role for every single thing that someone could
theoretically want to sever, because then we'll have a zillion of them
and it will be unmaintainable. So the idea was to try to break up
everything someone might want to do either via DDL or by setting GUCs
into one of three categories: internal to the database
(pg_database_security), facing outward toward the network
(pg_network_security), and facing inward toward the host
(pg_host_security). If we didn't have any predefined security-related
roles already, this would still have complications, but as things
stand it has more, because as you point out, pg_read_server_files
overlaps with pg_host_security. But what do we do about that? Neither
pg_read_server_files nor pg_write_server_files covers the ability to
create tablespaces or set log_directory, but I think it's more
desirable to lump those things together in one bucket than to have a
ton of separate buckets for each individual thing. I guess one option
would to grant the existing roles pg_read_server_files and
pg_write_server_files to the new pg_host_security role, or whatever we
decide to call it (pg_access_server_filesystem?
pg_administer_server_files? pg_hack_postgres_account?). But I'm open
to suggestions. See also below here on the overall intent.

> I'm also left wondering if this doesn't end up introducing opportunities
> for someone with this role to become superuser pretty easily.  Maybe it
> does and maybe we're ok with that, but I would think that it'd be really
> useful to have a role that can't become superuser easily which can
> access/modify most objects in the system.

Creating something like that is precisely the intention here because,
like you, I think that would be extremely handy. If it's possible for
that role to become superuser, we've lost the plot.

> I don't really see either of those as being filesystem changing things.

I think the thought process here was that if you are a managed
services provider you would not want the user to change
zero_damaged_pages or wal_sync_method or things like that because that
stuff is the provider's responsibility; similar for the recovery
settings. But yes ... we need something better here, I think.

> It's often the case that logging/auditing are handled by a different
> group than those who might be creating/modifying objects.  Yet another
> group is often the group that actually handles granting access.  Broad
> classes being:
>
> - Users
> - Auditors (controls what's logged, what is audited, etc)
> - Security (controls who has access to what)
>
> Note that 'security' and 'auditors' shouldn't have access to the actual
> data either, or have the ability to do things like modify data.  Not
> sure all of this quite fits what we're going for here but figured it
> might help with sorting out what other buckets we need.

Hmm, interesting point. The division between the "security" group, who
I suppose would be the DBA, and the "auditors" group is one I had
thought about only slightly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Followup Timestamp to timestamp with TZ conversion

2021-07-22 Thread Tom Lane
Robert Haas  writes:
> I agree that it doesn't follow in general. I think it does in the case
> of timestamp and timestamptz, because I don't think either the choice
> of time zone or the fact that we're reckoning relative to a time zone
> can change which of two timestamps is considered earlier. However, I
> think the only infrastructure we have for proving that is to look to
> see whether it's the same operator family in both cases. Because
> timestamp_ops and timestamptz_ops are separate, that doesn't help
> here.

Right.  It would in fact work for these two types, but we do not have
infrastructure that would allow us to know that.  I'm not sure about
your idea that "same operator family" is enough.

(Even for these two types, while a plain btree index should be fine,
I think it wouldn't be hard to construct expression indexes that
would not be compatible.  So there's a lot of worms in that can.)

regards, tom lane




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jul 1, 2021 at 11:59 AM Mark Dilger  
> wrote
> > > On Jun 29, 2021, at 6:25 PM, Mark Dilger  
> > > wrote:
> > > Please find attached a new set of patches.
> >
> > And again, this time attaching a fifth patch which includes the work to 
> > allow users who belong to the right security role to SET and ALTER SYSTEM 
> > SET variables without being a superuser.
> 
> In general, I don't like this kind of coding:
> 
> -   /* Superusers bypass all permission checking. */
> -   if (superuser_arg(roleid))
> +   /*
> +* Superusers and members of the pg_host_security role bypass all
> +* permission checking.
> +*/
> +   if (superuser_arg(roleid) ||
> +   has_privs_of_role(roleid, ROLE_PG_HOST_SECURITY))
> return true;
> 
> >From a theoretical point of view, the superuser has the privileges of
> every role, so this is redundant. From a coding point of view,
> has_privs_of_role() does a superuser_arg() test already, and it seems
> likely to be a loser to do it twice. Arguably, we should take the same
> approach in code comments and documentation, rather than saying "You
> must be superuser or a member of role XYZ" everywhere, but there seems
> to be some existing precedent for mentioning superuser explicitly in
> those cases, so maybe it's fine. I think it's kind of weird though,
> because in other places we don't do it, e.g.:
> 
>   unique or primary key constraint in the referenced table.  The user
>   must have REFERENCES permission on the referenced 
> table
>   (either the whole table, or the specific referenced columns).  The

I tend to agree that it'd be better to clean this up and just use
has_privs_of_role() and not include explicit superuser checks.  I don't
think we need to constantly re-remind ourselves in the code that
superusers are members of all roles.

> We could have said "or be superuser" there, but we didn't. It doesn't
> seem efficient to say "or be superuser" every time we mention a
> required permission, rather than just taking it as a given that the
> superuser has all permissions. Yet, again, there's some precedent for
> your approach:
> 
>To create a database, you must be a superuser or have the special
>CREATEDB privilege.
>See .

I'm a bit on the fence about the documentation side...  I could be
convinced either way, really, but I generally agree that it'd be good to
pick one and be consistent.  I don't think the places where we do/don't
mention it were done for any particular reason.

> So I don't know. At the very least I think we should not do it as an
> "or" in the code; what we want to do in comments and documentation I'm
> less sure.

Agreed.

> I think 0002 needs more explanation of the theory behind the specific
> permissions granted. It enumerates what they are in both the commit
> message and the documentation, but no theory is offered explaining why
> these permissions are included and not others. I think my idea was
> that "host" security would encompass everything that touches the
> filesystem on the server where the database is running. I agree that
> this naturally includes the ability to create a tablespace and
> probably, at least for symmetry, the ability to drop it. But, you
> can't ALTER a tablespace's location, so I see no reason why that
> should be tied to this permission. I think it's arguable whether it
> includes creating and dropping extensions, but I would argue that it
> shouldn't. True, the extensions include SQL files installed in the
> filesystem, and shared libraries also installed on the filesystem, but
> ultimately everything you ever do involves files in some way, so I
> don't see that as a very compelling argument. These operations on
> extensions don't let you modify the filesystem in any way, and they
> only let you read from carefully sandboxed things that are designed
> for precisely that purpose, so the system administrator really already
> has good control. The sorts of things I'd include in this category are
> things like server-side COPY FROM or COPY TO. When we come to the
> third thing the patch includes in this category, creating and dropping
> event triggers, I *really* don't understand why that one is considered
> host security. That one isn't touching the filesystem even to the
> extent that the extension stuff is; it seems to me to be purely
> internal to the database. Yeah, OK, that could involve writing files
> because we make catalog entries, but so could any DDL. Now, maybe
> there's a theory of operation that you have in mind that makes this
> all make more sense the way you have it, but if so, it seems not to be
> spelled out anywhere in the patch itself or the commit message you
> wrote for it, so I'm in the dark.

I agree that installing extensions and event triggers are different
things, and if what we're left with is "create tablespaces" then maybe
we should have that as 

Re: window build doesn't apply PG_CPPFLAGS correctly

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 20:52 odesílatel Andrew Dunstan 
napsal:

>
> On 7/22/21 9:41 AM, Alvaro Herrera wrote:
> > On 2021-Jul-22, Pavel Stehule wrote:
> >
> >> čt 22. 7. 2021 v 14:04 odesílatel Andrew Dunstan 
> >> napsal:
> >>> Almost everything in the Makefiles is not used by the MSVC buid system.
> >>> Using this one seems likely to be quite difficult, since the syntax for
> >>> the MSVC compiler command line is very different, and furthermore the
> >>> MSVC build system doesn't know anything about how to use this setting.
> >>>
> >>> AFAICT PG_CPPFLAGS is only used by pgxs.
> >>>
> >>> You would need to tell us more about how your build process is working.
> >> I need access to plpgsql.h in build time. This is only one dependency.
> When
> >> I build an extension, then plpgsql.h is in a shared directory. But when
> I
> >> build a module for a test, the header files are not installed yet. For
> >> build it requires an include dir -I$(top_srcdir)/src/pl/plpgsql/src
> > But Project.pm parses Makefiles and puts stuff into the MSVC buildsystem
> > file format; note David Rowley's patch that (among other things) removes
> > a bunch of ->AddIncludeDir calls by parsing PG_CPPFLAGS
> >
> https://postgr.es/m/CAApHDvpXoav0aZnsji-ZNdo=9txqawnwmsh44gyn8k7i2pr...@mail.gmail.com
> > which is probably apropos.
> >
>
>
> Yeah, but that hasn't been applied yet. Pavel should be able to use what
> I gave him today, I think.
>

yes, and it is working

Thank you very much

Pavel


>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Re: window build doesn't apply PG_CPPFLAGS correctly

2021-07-22 Thread Andrew Dunstan


On 7/22/21 9:41 AM, Alvaro Herrera wrote:
> On 2021-Jul-22, Pavel Stehule wrote:
>
>> čt 22. 7. 2021 v 14:04 odesílatel Andrew Dunstan 
>> napsal:
>>> Almost everything in the Makefiles is not used by the MSVC buid system.
>>> Using this one seems likely to be quite difficult, since the syntax for
>>> the MSVC compiler command line is very different, and furthermore the
>>> MSVC build system doesn't know anything about how to use this setting.
>>>
>>> AFAICT PG_CPPFLAGS is only used by pgxs.
>>>
>>> You would need to tell us more about how your build process is working.
>> I need access to plpgsql.h in build time. This is only one dependency. When
>> I build an extension, then plpgsql.h is in a shared directory. But when I
>> build a module for a test, the header files are not installed yet. For
>> build it requires an include dir -I$(top_srcdir)/src/pl/plpgsql/src
> But Project.pm parses Makefiles and puts stuff into the MSVC buildsystem
> file format; note David Rowley's patch that (among other things) removes
> a bunch of ->AddIncludeDir calls by parsing PG_CPPFLAGS
> https://postgr.es/m/CAApHDvpXoav0aZnsji-ZNdo=9txqawnwmsh44gyn8k7i2pr...@mail.gmail.com
> which is probably apropos.
>


Yeah, but that hasn't been applied yet. Pavel should be able to use what
I gave him today, I think.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Followup Timestamp to timestamp with TZ conversion

2021-07-22 Thread Robert Haas
On Thu, Jul 22, 2021 at 11:29 AM Tom Lane  wrote:
> As a thought experiment to prove that this is an issue, suppose that
> somebody invented an unsigned integer type, and made the cast from
> regular int4 follow the rules of a C cast, so that e.g. -1 becomes
> 2^32-1.  Given that, an ALTER TYPE from int4 to the unsigned type
> could skip the heap rewrite.  But we absolutely would have to rebuild
> any btree index on the column, because the sort ordering of the two
> types is different.  OTOH, it's quite likely that a hash index would
> not really need to be rebuilt.  So this is a real can of worms and
> we've not cared to open it.

I agree that it doesn't follow in general. I think it does in the case
of timestamp and timestamptz, because I don't think either the choice
of time zone or the fact that we're reckoning relative to a time zone
can change which of two timestamps is considered earlier. However, I
think the only infrastructure we have for proving that is to look to
see whether it's the same operator family in both cases. Because
timestamp_ops and timestamptz_ops are separate, that doesn't help
here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Column Filtering in Logical Replication

2021-07-22 Thread Alvaro Herrera
One thing I just happened to notice is this part of your commit message

: REPLICA IDENTITY columns are always replicated
: irrespective of column names specification.

... for which you don't have any tests -- I mean, create a table with a
certain REPLICA IDENTITY and later try to publish a set of columns that
doesn't include all the columns in the replica identity, then verify
that those columns are indeed published.

Having said that, I'm not sure I agree with this design decision; what I
think this is doing is hiding from the user the fact that they are
publishing columns that they don't want to publish.  I think as a user I
would rather get an error in that case:

  ERROR:  invalid column list in published set
  DETAIL:  The set of published commands does not include all the replica 
identity columns.

or something like that.  Avoid possible nasty surprises of security-
leaking nature.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)




Re: refactoring basebackup.c

2021-07-22 Thread Robert Haas
On Thu, Jul 22, 2021 at 1:14 PM tushar  wrote:
> On 7/19/21 8:29 PM, Dilip Kumar wrote:
> > I am not sure why this is working, from the code I could not find if
> > the backup target is server then are we doing anything with the -R
> > option or we are just silently ignoring it
>
> OK, in an  another scenario  I can see , "-t server" working with
> "--server-compression" option  but not with -z  or -Z ?

Right. The error messages or documentation might need some work, but
it's expected that you won't be able to do client-side compression if
the backup is being sent someplace other than to the client.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Robert Haas
On Thu, Jul 22, 2021 at 1:29 PM Mark Dilger
 wrote:
> Certainly not.  What I meant on May 28 by "eventually" was that the patch set 
> (posted May 25 and named "v3") had not yet implemented such security, as I 
> was fishing for comments from the community about whether the basic division 
> of superuser into these new roles was the right division.  Having gotten 
> little feedback on that, on June 29 I posted another patch set (confusingly 
> also named "v3", my apologies) in which patch 0001 had expanded to include 
> new security restrictions.

Oh.

> Prior to this patch, the logical replication workers run under the userid of 
> the owner of the subscription.  This is unchanged after the patch.  The real 
> difference is that prior to the patch, only superusers can own subscriptions, 
> so checking permissions on tables during replication would be silly (though 
> not harmful).  The worker is assured of passing all such permission checks by 
> virtue of being a superuser.  After the patch, since subscription owners need 
> not be superusers, the permission checks are no longer silly.  There is no 
> assurance that they have permission to apply changes to a table, so naturally 
> that has to be checked, and it is.

Aren't you supposing that the set of superusers never changes? Unless
we have some code for this that we don't have elsewhere, a superuser
could create a subscription and then be de-superuser'd, or the
subscription's owner could be altered.

> > I
> > also think it should be accompanied not only by new test cases (which
> > you seem to have added, though I have not reviewed them in detail) but
> > also documentation changes (which seem to be missing, since the doc
> > changes are all about the new predefined role). This is a really
> > significant behavior change to logical replication IMV and shouldn't
> > just be slipped into some other patch.
>
> I'm not sure what is meant by "slipped into some other patch", but I *think* 
> you mean that the documentation changes should not be in a separate patch 
> from the behavioral changes.  I agree with that.  I'll add documentation of 
> the changes to logical replication in the same patch as the changes 
> themselves.

I just meant that I think the behavioral change to logical replication
is significant in its own right and should be a separate patch.
Perhaps it's not as significant as I thought, but I still think it
should be made separately and likely documented as an incompatibility
with previous releases, unless I'm still confused.

> > It also seems based on Noah's comments and your response that there
> > might be some other issue here, and I haven't understood what that is,
> > but I think that should also be fixed separately, and first.
> > Considering all this, I would suggest not having this be patch #1 in
> > your series; make something come first that doesn't have
> > prerequisites.
>
> The issue that gets thrown around in the email archive is that "arbitrary 
> code" can be made to run on the subscriber side.  As I understand the 
> problem, this is because trigger functions can be created on tables with 
> arbitrary code in them, and that code will be executed under the userid of 
> the user who causes the trigger to fire during an insert/update/delete rather 
> than as the user who created the trigger.  This of course is not peculiar to 
> logical replication; it is how triggers work generally.  What is peculiar is 
> that a non-superuser who can create tables, triggers, publications and 
> subscriptions can get the logical replication worker to perform 
> inserts/updates/deletes on those tables, thereby firing those triggers, and 
> executing the trigger code as superuser.  That is ordinarily not something 
> that a user can do simply by creating a table with a trigger, since there 
> would be no mechanism to force the superuser to perform operations on the 
> table.
>
> After patch 0001 (which will be split in the next patch set, but hasn't been 
> split yet) the user who creates the subscription is also the user whose 
> permissions are checked when operating on the table and executing the 
> trigger.  This closes the security hole, so far as I am aware.  I would very 
> much like more eyeballs on this patch, and if anybody sees why this is an 
> insufficient solution, please speak up.  But it's not as if I punted the 
> security issue down the road to some ill-defined future patch.  On the 
> contrary, this patch both creates the ability to delegate subscription 
> creation authority to a non-superuser and closes the security hole which that 
> would otherwise entail, or at least, that is the intent.

OK. I thought Noah must be talking about some other problem, because
on May 28th you wrote "Oh, I agree that this patch set does not go the
extra step to make it safe" and I failed to understand that you
thought you'd addressed this in v4.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Robert Haas
On Thu, Jul 1, 2021 at 11:59 AM Mark Dilger  wrote
> > On Jun 29, 2021, at 6:25 PM, Mark Dilger  
> > wrote:
> > Please find attached a new set of patches.
>
> And again, this time attaching a fifth patch which includes the work to allow 
> users who belong to the right security role to SET and ALTER SYSTEM SET 
> variables without being a superuser.

In general, I don't like this kind of coding:

-   /* Superusers bypass all permission checking. */
-   if (superuser_arg(roleid))
+   /*
+* Superusers and members of the pg_host_security role bypass all
+* permission checking.
+*/
+   if (superuser_arg(roleid) ||
+   has_privs_of_role(roleid, ROLE_PG_HOST_SECURITY))
return true;

>From a theoretical point of view, the superuser has the privileges of
every role, so this is redundant. From a coding point of view,
has_privs_of_role() does a superuser_arg() test already, and it seems
likely to be a loser to do it twice. Arguably, we should take the same
approach in code comments and documentation, rather than saying "You
must be superuser or a member of role XYZ" everywhere, but there seems
to be some existing precedent for mentioning superuser explicitly in
those cases, so maybe it's fine. I think it's kind of weird though,
because in other places we don't do it, e.g.:

  unique or primary key constraint in the referenced table.  The user
  must have REFERENCES permission on the referenced table
  (either the whole table, or the specific referenced columns).  The

We could have said "or be superuser" there, but we didn't. It doesn't
seem efficient to say "or be superuser" every time we mention a
required permission, rather than just taking it as a given that the
superuser has all permissions. Yet, again, there's some precedent for
your approach:

   To create a database, you must be a superuser or have the special
   CREATEDB privilege.
   See .

So I don't know. At the very least I think we should not do it as an
"or" in the code; what we want to do in comments and documentation I'm
less sure.

I think 0002 needs more explanation of the theory behind the specific
permissions granted. It enumerates what they are in both the commit
message and the documentation, but no theory is offered explaining why
these permissions are included and not others. I think my idea was
that "host" security would encompass everything that touches the
filesystem on the server where the database is running. I agree that
this naturally includes the ability to create a tablespace and
probably, at least for symmetry, the ability to drop it. But, you
can't ALTER a tablespace's location, so I see no reason why that
should be tied to this permission. I think it's arguable whether it
includes creating and dropping extensions, but I would argue that it
shouldn't. True, the extensions include SQL files installed in the
filesystem, and shared libraries also installed on the filesystem, but
ultimately everything you ever do involves files in some way, so I
don't see that as a very compelling argument. These operations on
extensions don't let you modify the filesystem in any way, and they
only let you read from carefully sandboxed things that are designed
for precisely that purpose, so the system administrator really already
has good control. The sorts of things I'd include in this category are
things like server-side COPY FROM or COPY TO. When we come to the
third thing the patch includes in this category, creating and dropping
event triggers, I *really* don't understand why that one is considered
host security. That one isn't touching the filesystem even to the
extent that the extension stuff is; it seems to me to be purely
internal to the database. Yeah, OK, that could involve writing files
because we make catalog entries, but so could any DDL. Now, maybe
there's a theory of operation that you have in mind that makes this
all make more sense the way you have it, but if so, it seems not to be
spelled out anywhere in the patch itself or the commit message you
wrote for it, so I'm in the dark.

I also tend to think that functions like pg_read_file() ought to come
with execute permission pre-granted, with grant option, to
pg_host_security, and perhaps similarly for adminpack.

>From the department of nitpicking, all four of your commit messages
begin with the word "Reducing" which I think should be just "Reduce".
Otherwise, at least to me, it doesn't look like a proper sentence.

0004 has this kind of thing all over the place:

-   /* Superusers bypass all permission checking. */
-   if (superuser_arg(roleid))
+   /*
+* Superusers and members of the pg_database_security role bypass all
+* permission checking.
+*/

If that were true, the pg_database_security role would be equivalent
to superuser, which isn't the intent, so I think the comment needs
more thought. Also, I think that somewhere in the patch, either in
code comments or 

Re: Rename of triggers for partitioned tables

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-22, Arne Roland wrote:

> alter table middle disable trigger b;
> creates the same kind of inconsistency
> alter trigger b on middle rename to something;
> does.
> With other words: enableing/disabling non-topmost triggers should be 
> forbidden as well.

I'm not so sure about that ... I think enabling/disabling triggers on
individual partitions is a valid use case.  Renaming them, not so much.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"  (Jorge González)




Re: Rename of triggers for partitioned tables

2021-07-22 Thread Arne Roland
From: Alvaro Herrera 
Sent: Thursday, July 22, 2021 19:33
To: Arne Roland
Subject: Re: Rename of triggers for partitioned tables

On 2021-Jul-22, Arne Roland wrote:

> Since it is sort of the same problem, I think it might be worthwhile
> to address it as well within this patch. Adding two to four ereports
> doesn't sound like scope creeping to me, even though it touches
> completely different code. I'll look into that as well.

I don't understand what you mean.  But here's an updated patch, with the
following changes

alter table middle disable trigger b;
creates the same kind of inconsistency
alter trigger b on middle rename to something;
does.
With other words: enableing/disabling non-topmost triggers should be forbidden 
as well.

Regards
Arne


Re: Rename of triggers for partitioned tables

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-22, Arne Roland wrote:

> I just noticed that apparently the
> ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ ENABLE | DISABLE ] TRIGGER ...
> syntax albeit looking totally different and it already recurses, it
> has precisely the same issue with pg_dump. In that case the ONLY
> syntax isn't optional and the 2. from your above post does inevitably
> apply.
> 
> Since it is sort of the same problem, I think it might be worthwhile
> to address it as well within this patch. Adding two to four ereports
> doesn't sound like scope creeping to me, even though it touches
> completely different code. I'll look into that as well.

Oh!  For some reason I failed to notice that you were talking about
ENABLE / DISABLE, not RENAME anymore.  It turns out that I recently
spent some time in this area; see commits below (these are from branch
REL_11_STABLE).  Are you talking about something that need to be handled
*beyond* these fixes?

commit ccfc3cbb341abf515d930097c4851d9e2152504f
Author: Alvaro Herrera  [Álvaro Herrera 
]
AuthorDate: Fri Jul 16 17:29:22 2021 -0400
CommitDate: Fri Jul 16 17:29:22 2021 -0400

Fix pg_dump for disabled triggers on partitioned tables

pg_dump failed to preserve the 'enabled' flag (which can be not only
disabled, but also REPLICA or ALWAYS) for partitions which had it
changed from their respective parents.  Attempt to handle that by
including a definition for such triggers in the dump, but replace the
standard CREATE TRIGGER line with an ALTER TRIGGER line.

Backpatch to 11, where these triggers can exist.  In branches 11 and 12,
pick up a few test lines from commit b9b408c48724 to verify that
pg_upgrade is okay with these arrangements.

Co-authored-by: Justin Pryzby 
Co-authored-by: Álvaro Herrera 
Discussion: https://postgr.es/m/20200930223450.ga14...@telsasoft.com

commit fed35bd4a65032f714af6bc88c102d0e90422dee
Author: Alvaro Herrera  [Álvaro Herrera 
]
AuthorDate: Fri Jul 16 13:01:43 2021 -0400
CommitDate: Fri Jul 16 13:01:43 2021 -0400

Preserve firing-on state when cloning row triggers to partitions

When triggers are cloned from partitioned tables to their partitions,
the 'tgenabled' flag (origin/replica/always/disable) was not propagated.
Make it so that the flag on the trigger on partition is initially set to
the same value as on the partitioned table.

Add a test case to verify the behavior.

Backpatch to 11, where this appeared in commit 86f575948c77.

Author: Álvaro Herrera 
Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20200930223450.ga14...@telsasoft.com

commit bbb927b4db9b3b449ccd0f76c1296de382a2f0c1
Author: Alvaro Herrera  [Álvaro Herrera 
]
AuthorDate: Tue Oct 20 19:22:09 2020 -0300
CommitDate: Tue Oct 20 19:22:09 2020 -0300

Fix ALTER TABLE .. ENABLE/DISABLE TRIGGER recursion

More precisely, correctly handle the ONLY flag indicating not to
recurse.  This was implemented in 86f575948c77 by recursing in
trigger.c, but that's the wrong place; use ATSimpleRecursion instead,
which behaves properly.  However, because legacy inheritance has never
recursed in that situation, make sure to do that only for new-style
partitioning.

I noticed this problem while testing a fix for another bug in the
vicinity.

This has been wrong all along, so backpatch to 11.

Discussion: https://postgr.es/m/20201016235925.GA29829@alvherre.pgsql

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: Rename of triggers for partitioned tables

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-22, Arne Roland wrote:

> Since it is sort of the same problem, I think it might be worthwhile
> to address it as well within this patch. Adding two to four ereports
> doesn't sound like scope creeping to me, even though it touches
> completely different code. I'll look into that as well.

I don't understand what you mean.  But here's an updated patch, with the
following changes

1. support for ONLY is removed, since evidently the only thing it is
good for is introduce inconsistencies

2. recursion to partitioned tables always occurs; no more conditionally
on relation->inh.  This is sensible because due to point 1 above, inh
can no longer be false.

3. renaming a trigger that's not topmost is forbidden.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
>From bac7f7f824ef3622281a4aed0a8e658eed3fa1a8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 21 Jul 2021 13:32:24 -0400
Subject: [PATCH v10] Make ALTER TRIGGER RENAME consistent for partitioned
 tables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Previously renaming clone triggers on partitions was allowed, which is
pretty much useless and causes pg_dump to be inconsistent (because the
different name is not propagated); and also, when triggers were renamed
on partitioned tables, the partitions were not affected.  Make the
operation recurse to partitions, and also forbid renaming clone
triggers.

Co-authored-by: Arne Roland 
Co-authored-by: Álvaro Herrera 
Reviewed-by: Zhihong Yu 
Discussion: https://postgr.es/m/d0fd7040c2fb4de1a111b9d9ccc45...@index.de
---
 doc/src/sgml/ref/alter_trigger.sgml|  15 +-
 src/backend/commands/trigger.c | 215 -
 src/test/regress/expected/triggers.out |  76 +
 src/test/regress/sql/triggers.sql  |  47 ++
 4 files changed, 307 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/alter_trigger.sgml b/doc/src/sgml/ref/alter_trigger.sgml
index 43a7da4f0b..cec50e2245 100644
--- a/doc/src/sgml/ref/alter_trigger.sgml
+++ b/doc/src/sgml/ref/alter_trigger.sgml
@@ -31,9 +31,20 @@ ALTER TRIGGER name ON 
ALTER TRIGGER changes properties of an existing
-   trigger.  The RENAME clause changes the name of
+   trigger.
+  
+  
+  
+   The RENAME clause changes the name of
the given trigger without otherwise changing the trigger
-   definition.  The DEPENDS ON EXTENSION clause marks
+   definition.
+   If the table that the trigger is on is a partitioned table,
+   then corresponding clone triggers in the partitions are
+   renamed too.
+  
+
+  
+   The DEPENDS ON EXTENSION clause marks
the trigger as dependent on an extension, such that if the extension is
dropped, the trigger will automatically be dropped as well.
   
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6d4b7ee92a..e983b361ea 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -71,6 +71,12 @@ int			SessionReplicationRole = SESSION_REPLICATION_ROLE_ORIGIN;
 static int	MyTriggerDepth = 0;
 
 /* Local function prototypes */
+static void renametrig_internal(Relation tgrel, Relation targetrel,
+HeapTuple trigtup, const char *newname,
+const char *expected_name);
+static void renametrig_partition(Relation tgrel, Oid partitionId,
+ Oid parentTriggerOid, const char *newname,
+ const char *expected_name);
 static void SetTriggerFlags(TriggerDesc *trigdesc, Trigger *trigger);
 static bool GetTupleForTrigger(EState *estate,
 			   EPQState *epqstate,
@@ -1442,38 +1448,16 @@ renametrig(RenameStmt *stmt)
 	targetrel = relation_open(relid, NoLock);
 
 	/*
-	 * Scan pg_trigger twice for existing triggers on relation.  We do this in
-	 * order to ensure a trigger does not exist with newname (The unique index
-	 * on tgrelid/tgname would complain anyway) and to ensure a trigger does
-	 * exist with oldname.
-	 *
-	 * NOTE that this is cool only because we have AccessExclusiveLock on the
-	 * relation, so the trigger set won't be changing underneath us.
+	 * On partitioned tables, this operation recurses to partitions.  Lock all
+	 * tables upfront.
 	 */
+	if (targetrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		(void) find_all_inheritors(relid, AccessExclusiveLock, NULL);
+
 	tgrel = table_open(TriggerRelationId, RowExclusiveLock);
 
 	/*
-	 * First pass -- look for name conflict
-	 */
-	ScanKeyInit([0],
-Anum_pg_trigger_tgrelid,
-BTEqualStrategyNumber, F_OIDEQ,
-ObjectIdGetDatum(relid));
-	ScanKeyInit([1],
-Anum_pg_trigger_tgname,
-BTEqualStrategyNumber, F_NAMEEQ,
-PointerGetDatum(stmt->newname));
-	tgscan = systable_beginscan(tgrel, TriggerRelidNameIndexId, true,
-NULL, 2, key);
-	if (HeapTupleIsValid(tuple = systable_getnext(tgscan)))
-		ereport(ERROR,
-(errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("trigger \"%s\" for relation \"%s\" already exists",
-		stmt->newname, 

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Mark Dilger



> On Jul 22, 2021, at 8:29 AM, Robert Haas  wrote:
> 
> I don't think that we want to commit a patch to add a
> pg_logical_replication role that can "eventually" be made staff to
> delegate to non-superusers.

Certainly not.  What I meant on May 28 by "eventually" was that the patch set 
(posted May 25 and named "v3") had not yet implemented such security, as I was 
fishing for comments from the community about whether the basic division of 
superuser into these new roles was the right division.  Having gotten little 
feedback on that, on June 29 I posted another patch set (confusingly also named 
"v3", my apologies) in which patch 0001 had expanded to include new security 
restrictions.

> Whatever issues need to be fixed should be
> fixed first, and then this change can be considered afterwards. It
> seems like you try to fix at least some of the issues in the patch,
> because I see permission checks being added in
> src/backend/replication/logical/worker.c, and I don't think that
> should happen in the same patch that adds the new predefined role.

Prior to this patch, the logical replication workers run under the userid of 
the owner of the subscription.  This is unchanged after the patch.  The real 
difference is that prior to the patch, only superusers can own subscriptions, 
so checking permissions on tables during replication would be silly (though not 
harmful).  The worker is assured of passing all such permission checks by 
virtue of being a superuser.  After the patch, since subscription owners need 
not be superusers, the permission checks are no longer silly.  There is no 
assurance that they have permission to apply changes to a table, so naturally 
that has to be checked, and it is. 

I don't really see this as two separate patches, since the addition of the 
permissions checks without the addition of non-superusers as logical 
replication workers is silly.  But I don't mind that much, either.  I'll break 
them in two for the next patch set.

> I
> also think it should be accompanied not only by new test cases (which
> you seem to have added, though I have not reviewed them in detail) but
> also documentation changes (which seem to be missing, since the doc
> changes are all about the new predefined role). This is a really
> significant behavior change to logical replication IMV and shouldn't
> just be slipped into some other patch.

I'm not sure what is meant by "slipped into some other patch", but I *think* 
you mean that the documentation changes should not be in a separate patch from 
the behavioral changes.  I agree with that.  I'll add documentation of the 
changes to logical replication in the same patch as the changes themselves.

> It also seems based on Noah's comments and your response that there
> might be some other issue here, and I haven't understood what that is,
> but I think that should also be fixed separately, and first.
> Considering all this, I would suggest not having this be patch #1 in
> your series; make something come first that doesn't have
> prerequisites.

The issue that gets thrown around in the email archive is that "arbitrary code" 
can be made to run on the subscriber side.  As I understand the problem, this 
is because trigger functions can be created on tables with arbitrary code in 
them, and that code will be executed under the userid of the user who causes 
the trigger to fire during an insert/update/delete rather than as the user who 
created the trigger.  This of course is not peculiar to logical replication; it 
is how triggers work generally.  What is peculiar is that a non-superuser who 
can create tables, triggers, publications and subscriptions can get the logical 
replication worker to perform inserts/updates/deletes on those tables, thereby 
firing those triggers, and executing the trigger code as superuser.  That is 
ordinarily not something that a user can do simply by creating a table with a 
trigger, since there would be no mechanism to force the superuser to perform 
operations on the table.

After patch 0001 (which will be split in the next patch set, but hasn't been 
split yet) the user who creates the subscription is also the user whose 
permissions are checked when operating on the table and executing the trigger.  
This closes the security hole, so far as I am aware.  I would very much like 
more eyeballs on this patch, and if anybody sees why this is an insufficient 
solution, please speak up.  But it's not as if I punted the security issue down 
the road to some ill-defined future patch.  On the contrary, this patch both 
creates the ability to delegate subscription creation authority to a 
non-superuser and closes the security hole which that would otherwise entail, 
or at least, that is the intent.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread John Naylor
On Thu, Jul 22, 2021 at 12:24 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
>
> Is date_bin supposed to return the beginning of the bin?

Thanks for testing! And yes.

> And does the sign of an interval define the "direction" of the bin?

No, the boundary is intentionally the earlier one:

/*
 * Make sure the returned timestamp is at the start of the bin, even if
 * the origin is in the future.
 */
if (origin > timestamp && stride_usecs > 1)
tm_delta -= stride_usecs;

I wonder if we should just disallow negative intervals here.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Rename of triggers for partitioned tables

2021-07-22 Thread Arne Roland

From: Alvaro Herrera 
Sent: Thursday, July 22, 2021 18:20
To: Arne Roland
Subject: Re: Rename of triggers for partitioned tables

If we have good use for such an index, I don't see why we can't add it.
But I'm not sure that it is justified -- certainly if the only benefit
is to make ALTER TRIGGER RENAME recurse faster on partitioned tables, it
is not justified.

Speed is not really the issue here, most people will have under 20 triggers per 
table anyways. I think even the simpler code would be worth more than the speed 
gain. The main value of such an index would be the enforced uniqueness.

I just noticed that apparently the
ALTER TABLE [ IF EXISTS ] [ ONLY ] name [ ENABLE | DISABLE ] TRIGGER ...
syntax albeit looking totally different and it already recurses, it has 
precisely the same issue with pg_dump. In that case the ONLY syntax isn't 
optional and the 2. from your above post does inevitably apply.

Since it is sort of the same problem, I think it might be worthwhile to address 
it as well within this patch. Adding two to four ereports doesn't sound like 
scope creeping to me, even though it touches completely different code. I'll 
look into that as well.

Regards
Arne



Re: refactoring basebackup.c

2021-07-22 Thread tushar

On 7/19/21 8:29 PM, Dilip Kumar wrote:

I am not sure why this is working, from the code I could not find if
the backup target is server then are we doing anything with the -R
option or we are just silently ignoring it


OK, in an  another scenario  I can see , "-t server" working with 
"--server-compression" option  but not with -z  or -Z ?


"-t  server" with option "-z"  / or (-Z )

[tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataN -Xnone  -z  
--no-manifest -p 9033

pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataNa -Z 1    
-Xnone  --server-compression=gzip4  --no-manifest -p 9033

pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

"-t server" with "server-compression"  (working)

[tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataN -Xnone  
--server-compression=gzip4  --no-manifest -p 9033
NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup

[tushar@localhost bin]$

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: truncating timestamps on arbitrary intervals

2021-07-22 Thread Bauyrzhan Sakhariyev
Is date_bin supposed to return the beginning of the bin? And does the sign
of an interval define the "direction" of the bin?
Judging by results of queries #1 and #2, sign of interval decides a
direction timestamp gets shifted to (in both cases ts < origin)
but when ts >origin (queries #3 and #4) interval sign doesn't matter,
specifically #4 doesn't return 6-th of January.

1. SELECT date_bin('-2 days'::interval, timestamp '2001-01-01
00:00:00', timestamp
'2001-01-04 00:00:00'); -- 2001-01-02 00:00:00
2. SELECT date_bin('2 days'::interval, timestamp '2001-01-01
00:00:00', timestamp
'2001-01-04 00:00:00'); -- 2000-12-31 00:00:00
3. SELECT date_bin('2 days'::interval, timestamp '2001-01-04
00:00:00', timestamp
'2001-01-01 00:00:00'); -- 2001-01-03 00:00:00
4. SELECT date_bin('-2 days'::interval, timestamp '2001-01-04
00:00:00', timestamp
'2001-01-01 00:00:00'); -- 2001-01-03 00:00:00

On Thu, Jul 22, 2021 at 6:21 PM John Naylor 
wrote:

> Hi,
>
> When analyzing time-series data, it's useful to be able to bin
> timestamps into equally spaced ranges. date_trunc() is only able to
> bin on a specified whole unit. In the attached patch for the March
> commitfest, I propose a new function date_trunc_interval(), which can
> truncate to arbitrary intervals, e.g.:
>
> select date_trunc_interval('15 minutes', timestamp '2020-02-16
> 20:48:40'); date_trunc_interval
> -
>  2020-02-16 20:45:00
> (1 row)
>
> With this addition, it might be possible to turn the existing
> date_trunc() functions into wrappers. I haven't done that here because
> it didn't seem practical at this point. For one, the existing
> functions have special treatment for weeks, centuries, and millennia.
>
> Note: I've only written the implementation for the type timestamp
> without timezone. Adding timezone support would be pretty simple, but
> I wanted to get feedback on the basic idea first before making it
> complete. I've also written tests and very basic documentation.
>
> --
> John Naylorhttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Rename of triggers for partitioned tables

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-22, Arne Roland wrote:

> 
> Hi,
> 
> looking at the patch, I realized the renametrig_partition could  use an index 
> leading with tgparentid, without the need to traverse the child tables. Since 
> we still need to lock them, there is likely no practical performance gain. 
> But I am surprised there is no unique index on (tgparentid, tgrelid), which 
> sounds like a decent sanity check to have anyways.

If we have good use for such an index, I don't see why we can't add it.
But I'm not sure that it is justified -- certainly if the only benefit
is to make ALTER TRIGGER RENAME recurse faster on partitioned tables, it
is not justified.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)




Re: pg_amcheck: Fix block number parsing on command line

2021-07-22 Thread Mark Dilger



> On Jul 22, 2021, at 7:56 AM, Peter Eisentraut 
>  wrote:
> 
> Please check that it's up to speed.
> <0001-pg_amcheck-Fix-block-number-parsing-on-command-line.patch>

This looks correct to me.  Thanks for the fix.

Your use of strtoul compares favorably to that in pg_resetwal in that you are 
checking errno and it is not.  The consequence is:

bin % ./pg_resetwal/pg_resetwal -e 

pg_resetwal: error: transaction ID epoch (-e) must not be -1
bin % ./pg_resetwal/pg_resetwal -e junkstring   
   
pg_resetwal: error: invalid argument for option -e
Try "pg_resetwal --help" for more information.

Unless people are relying on this behavior, I would think pg_resetwal should 
complain of an invalid argument for both of those, rather than complaining 
about -1.  That's not to do with this patch, but if we're tightening up the use 
of strtol in frontend tools, maybe we can use the identical logic in both 
places.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: proposal: enhancing plpgsql debug API - returns text value of variable content

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 14:54 odesílatel Aleksander Alekseev <
aleksan...@timescale.com> napsal:

> Hi Pavel,
>
> >> I am sending an enhanced patch about the regress test for plpgsql's
> debug API.
>
> Thanks for the test! I noticed some little issues with formatting and
> typos. The corrected patch is attached.
>
> > override CPPFLAGS := $(CPPFLAGS) -I$(top_srcdir)/src/pl/plpgsql/src
>
> You probably already noticed, but for the record - AppVeyor doesn't seem
> to be happy still [1]:
>
> ```
> src/test/modules/test_dbgapi/test_dbgapi.c(17): fatal error C1083: Cannot
> open include file: 'plpgsql.h': No such file or directory
> [C:\projects\postgresql\test_dbgapi.vcxproj]
>
```
>
> [1]:
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.141500
>

I know it. Attached patch try to fix this issue

I merged you patch (thank you)

Regards

Pavel



>
> --
> Best regards,
> Aleksander Alekseev
>
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 14bbe12da5..e0e68a2592 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4059,6 +4059,8 @@ plpgsql_estate_setup(PLpgSQL_execstate *estate,
 	{
 		(*plpgsql_plugin_ptr)->error_callback = plpgsql_exec_error_callback;
 		(*plpgsql_plugin_ptr)->assign_expr = exec_assign_expr;
+		(*plpgsql_plugin_ptr)->eval_datum = exec_eval_datum;
+		(*plpgsql_plugin_ptr)->cast_value = do_cast_value;
 
 		if ((*plpgsql_plugin_ptr)->func_setup)
 			((*plpgsql_plugin_ptr)->func_setup) (estate, func);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 0f6a5b30b1..abe7d63f78 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -415,6 +415,7 @@ pl_block		: decl_sect K_BEGIN proc_sect exception_sect K_END opt_label
 		new->cmd_type	= PLPGSQL_STMT_BLOCK;
 		new->lineno		= plpgsql_location_to_lineno(@2);
 		new->stmtid		= ++plpgsql_curr_compile->nstatements;
+		new->ns			= plpgsql_ns_top();
 		new->label		= $1.label;
 		new->n_initvars = $1.n_initvars;
 		new->initvarnos = $1.initvarnos;
@@ -907,7 +908,8 @@ stmt_perform	: K_PERFORM
 		new = palloc0(sizeof(PLpgSQL_stmt_perform));
 		new->cmd_type = PLPGSQL_STMT_PERFORM;
 		new->lineno   = plpgsql_location_to_lineno(@1);
-		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		plpgsql_push_back_token(K_PERFORM);
 
 		/*
@@ -943,6 +945,7 @@ stmt_call		: K_CALL
 		new->cmd_type = PLPGSQL_STMT_CALL;
 		new->lineno = plpgsql_location_to_lineno(@1);
 		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->ns = plpgsql_ns_top();
 		plpgsql_push_back_token(K_CALL);
 		new->expr = read_sql_stmt();
 		new->is_call = true;
@@ -962,6 +965,7 @@ stmt_call		: K_CALL
 		new->cmd_type = PLPGSQL_STMT_CALL;
 		new->lineno = plpgsql_location_to_lineno(@1);
 		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->ns = plpgsql_ns_top();
 		plpgsql_push_back_token(K_DO);
 		new->expr = read_sql_stmt();
 		new->is_call = false;
@@ -1000,7 +1004,8 @@ stmt_assign		: T_DATUM
 		new = palloc0(sizeof(PLpgSQL_stmt_assign));
 		new->cmd_type = PLPGSQL_STMT_ASSIGN;
 		new->lineno   = plpgsql_location_to_lineno(@1);
-		new->stmtid = ++plpgsql_curr_compile->nstatements;
+		new->stmtid   = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->varno = $1.datum->dno;
 		/* Push back the head name to include it in the stmt */
 		plpgsql_push_back_token(T_DATUM);
@@ -1022,6 +1027,7 @@ stmt_getdiag	: K_GET getdiag_area_opt K_DIAGNOSTICS getdiag_list ';'
 		new->cmd_type = PLPGSQL_STMT_GETDIAG;
 		new->lineno   = plpgsql_location_to_lineno(@1);
 		new->stmtid	  = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->is_stacked = $2;
 		new->diag_items = $4;
 
@@ -1194,6 +1200,7 @@ stmt_if			: K_IF expr_until_then proc_sect stmt_elsifs stmt_else K_END K_IF ';'
 		new->cmd_type	= PLPGSQL_STMT_IF;
 		new->lineno		= plpgsql_location_to_lineno(@1);
 		new->stmtid		= ++plpgsql_curr_compile->nstatements;
+		new->ns			= plpgsql_ns_top();
 		new->cond		= $2;
 		new->then_body	= $3;
 		new->elsif_list = $4;
@@ -1299,6 +1306,7 @@ stmt_loop		: opt_loop_label K_LOOP loop_body
 		new->cmd_type = PLPGSQL_STMT_LOOP;
 		new->lineno   = plpgsql_location_to_lineno(@2);
 		new->stmtid   = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->label	  = $1;
 		new->body	  = $3.stmts;
 
@@ -1317,6 +1325,7 @@ stmt_while		: opt_loop_label K_WHILE expr_until_loop loop_body
 		new->cmd_type = PLPGSQL_STMT_WHILE;
 		new->lineno   = plpgsql_location_to_lineno(@2);
 		new->stmtid	  = ++plpgsql_curr_compile->nstatements;
+		new->ns		  = plpgsql_ns_top();
 		new->label	  = $1;
 			

Re: WIP: Relaxing the constraints on numeric scale

2021-07-22 Thread Dean Rasheed
On Wed, 21 Jul 2021 at 22:33, Tom Lane  wrote:
>
> I took a brief look at this and have a couple of quick suggestions:
>

Thanks for looking at this!

> * As you mention, keeping some spare bits in the typmod might come
> in handy some day, but as given this patch isn't really doing so.
> I think it might be advisable to mask the scale off at 11 bits,
> preserving the high 5 bits of the low-order half of the word for future
> use.  The main objection to that I guess is that it would complicate
> doing sign extension in TYPMOD_SCALE().  But it doesn't seem like we
> use that logic in any really hot code paths, so another instruction
> or three probably is not much of a cost.
>

Yeah, that makes sense, and it's worth documenting where the spare bits are.

Interestingly, gcc recognised the bit hack I used for sign extension
and turned it into (x << 21) >> 21 using x86 shl and sar instructions,
though I didn't write it that way because apparently that's not
portable.

> * I agree with wrapping the typmod construction/extraction into macros
> (or maybe they should be inline functions?) but the names you chose
> seem generic enough to possibly confuse onlookers.  I'd suggest
> changing TYPMOD to NUMERIC_TYPMOD or NUM_TYPMOD.  The comment for them
> should probably also explicitly explain "For purely historical reasons,
> VARHDRSZ is added to the typmod value after these fields are combined",
> or words to that effect.
>

I've turned them into inline functions, since that makes them easier
to read, and debug if necessary.

All your other suggestions make sense too. Attached is a new version.

Regards,
Dean
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index e016f96..6abda2f
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -545,8 +545,8 @@
 
 NUMERIC(precision, scale)
 
- The precision must be positive, the scale zero or positive.
- Alternatively:
+ The precision must be positive, the scale may be positive or negative
+ (see below).  Alternatively:
 
 NUMERIC(precision)
 
@@ -578,11 +578,41 @@ NUMERIC
 
  If the scale of a value to be stored is greater than the declared
  scale of the column, the system will round the value to the specified
- number of fractional digits.  Then, if the number of digits to the
- left of the decimal point exceeds the declared precision minus the
- declared scale, an error is raised.
+ number of fractional digits.  If the declared scale of the column is
+ negative, the value will be rounded to the left of the decimal point.
+ If, after rounding, the number of digits to the left of the decimal point
+ exceeds the declared precision minus the declared scale, an error is
+ raised.  Similarly, if the declared scale exceeds the declared precision
+ and the number of zero digits to the right of the decimal point is less
+ than the declared scale minus the declared precision, an error is raised.
+ For example, a column declared as
+
+NUMERIC(3, 1)
+
+ will round values to 1 decimal place and be able to store values between
+ -99.9 and 99.9, inclusive.  A column declared as
+
+NUMERIC(2, -3)
+
+ will round values to the nearest thousand and be able to store values
+ between -99000 and 99000, inclusive.  A column declared as
+
+NUMERIC(3, 5)
+
+ will round values to 5 decimal places and be able to store values between
+ -0.00999 and 0.00999, inclusive.
 
 
+
+ 
+  The scale in a NUMERIC type declaration may be any value in
+  the range -1000 to 1000.  (The SQL standard requires
+  the scale to be in the range 0 to precision.
+  Using values outside this range may not be portable to other database
+  systems.)
+ 
+
+
 
  Numeric values are physically stored without any extra leading or
  trailing zeroes.  Thus, the declared precision and scale of a column
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 2a0f68f..46cb37c
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -816,6 +816,52 @@ numeric_is_integral(Numeric num)
 }
 
 /*
+ * make_numeric_typmod() -
+ *
+ *	Pack numeric precision and scale values into a typmod.  The upper 16 bits
+ *	are used for the precision (though actually not all these bits are needed,
+ *	since the maximum allowed precision is 1000).  The lower 16 bits are for
+ *	the scale, but since the scale is constrained to the range [-1000, 1000],
+ *	we use just the lower 11 of those 16 bits, and leave the remaining 5 bits
+ *	unset, for possible future use.
+ *
+ *	For purely historical reasons VARHDRSZ is then added to the result, thus
+ *	the unused space in the upper 16 bits is not all as freely available as it
+ *	might seem.
+ */
+static inline int
+make_numeric_typmod(int precision, int scale)
+{
+	return ((precision << 16) | (scale & 0x7ff)) + VARHDRSZ;
+}
+
+/*
+ * 

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-22 Thread Robert Haas
On Fri, May 28, 2021 at 1:42 PM Mark Dilger
 wrote:
> > pg_logical_replication would not be safe to delegate that way:
> > https://postgr.es/m/flat/CACqFVBbx6PDq%2B%3DvHM0n78kHzn8tvOM-kGO_2q_q0zNAMT%2BTzdA%40mail.gmail.com
>
> Oh, I agree that this patch set does not go the extra step to make it safe.  
> You are quite right to push back, as my email was poorly worded.  I should 
> have said "intended to be eventually made safe to delegate". The idea is that 
> the patch set addresses most places in the sources where we test for 
> superuser and tests instead for (superuser || ), and then uses 
> that same set of roles to control who has sufficient privileges to set GUCs.  
> The pg_host_security and pg_network_security roles are not intended to 
> eventually be safe to delegate.  Or at least, I can't see any clear path to 
> getting there.  The pg_database_security and pg_logical_replication roles 
> should be ones we can make safe.  If we can agree as a community which set of 
> roles are appropriate, then we can have separate patches as needed for 
> tightening the security around them.

I don't think that we want to commit a patch to add a
pg_logical_replication role that can "eventually" be made staff to
delegate to non-superusers. Whatever issues need to be fixed should be
fixed first, and then this change can be considered afterwards. It
seems like you try to fix at least some of the issues in the patch,
because I see permission checks being added in
src/backend/replication/logical/worker.c, and I don't think that
should happen in the same patch that adds the new predefined role. I
also think it should be accompanied not only by new test cases (which
you seem to have added, though I have not reviewed them in detail) but
also documentation changes (which seem to be missing, since the doc
changes are all about the new predefined role). This is a really
significant behavior change to logical replication IMV and shouldn't
just be slipped into some other patch.

It also seems based on Noah's comments and your response that there
might be some other issue here, and I haven't understood what that is,
but I think that should also be fixed separately, and first.
Considering all this, I would suggest not having this be patch #1 in
your series; make something come first that doesn't have
prerequisites.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 17:23 odesílatel Pavel Stehule 
napsal:

>
>
> čt 22. 7. 2021 v 16:58 odesílatel Fabien COELHO 
> napsal:
>
>>
>> >> Ok. I noticed. The patch got significantly broken by the watch pager
>> >> commit. I also have to enhance the added tests (per Peter request).
>> >
>> > I wrote a test to check psql query cancel support.  I checked that it
>> fails
>> > against the patch that was reverted.  Maybe this is useful.
>>
>> Here is the updated version (v8? I'm not sure what the right count is),
>> which works for me and for "make check", including some tests added for
>> uncovered paths.
>>
>> I included your tap test (thanks again!) with some more comments and
>> cleanup.
>>
>> I tested manually for the pager feature, which mostly work, althoug
>> "pspg --stream" does not seem to expect two tables, or maybe there is a
>> way to switch between these that I have not found.
>>
>
> pspg doesn't support this feature. Theoretically it can be implementable
> (I am able to hold two datasets now), but without any help with
> synchronization I don't want to implement any more complex parsing. On the
> pspg side I am not able to detect what is the first result in the batch,
> what is the last result (without some hard heuristics - probably I can read
> some information from timestamps). And if you need two or more results in
> one terminal, then mode without pager is better.
>

but the timestamps are localized, and again I have not enough information
on the pspg side for correct parsing.

So until psql will use some tags that allow more simple detection of start
and end batch or relation, this feature will not be supported by pspg :-/.
There are some invisible ascii codes that can be used for this purpose.


>
>
>> --
>> Fabien.
>
>


Re: Followup Timestamp to timestamp with TZ conversion

2021-07-22 Thread Tom Lane
Peter Volk  writes:
> The problem is that I have a 60TB+ PG installation for which we need to
> modify all of the timestamp columns to timestamp with tz. The data in the
> columns are already in UTC so we can benefit from the patch listed above.
> Yet there are 2 cases in which we are having an issue.

> 1) Index rebuilds: The patch is only avoiding a rewrite of the table data
> but is not avoiding a rebuild of the indexes. Following the logic in the
> patch above this should also be avoidable under the same condition

I don't think that follows.  What we are concerned about when determining
whether a heap rewrite can be skipped is whether the stored heap entries
are bit-compatible between the two data types.  To decide that an index
rebuild is not needed, you'd need to further determine that their sort
orders are equivalent (for a btree index, or who-knows-what semantic
condition for other types of indexes).  We don't attempt to do that,
so index rebuilds are always needed.

As a thought experiment to prove that this is an issue, suppose that
somebody invented an unsigned integer type, and made the cast from
regular int4 follow the rules of a C cast, so that e.g. -1 becomes
2^32-1.  Given that, an ALTER TYPE from int4 to the unsigned type
could skip the heap rewrite.  But we absolutely would have to rebuild
any btree index on the column, because the sort ordering of the two
types is different.  OTOH, it's quite likely that a hash index would
not really need to be rebuilt.  So this is a real can of worms and
we've not cared to open it.

> 2) Partitioned tables with the timestamp as partition column: In this case
> the current version does not allow a modification of the column data type
> at all.

PG's partitioning features are still being built out, but I would not
hold my breath for that specific thing to change.  Again, the issue
is that bit-compatibility of individual values doesn't prove much
about comparison semantics, so it's not clear that a change of
data type still allows the value-to-partition assignment to be
identical.  (This is clearly an issue for RANGE partitioning.  Maybe
it could be finessed for HASH or LIST, but you'd still be needing
semantic assumptions that go beyond mere bit-compatibility of values.)

regards, tom lane




Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 16:58 odesílatel Fabien COELHO 
napsal:

>
> >> Ok. I noticed. The patch got significantly broken by the watch pager
> >> commit. I also have to enhance the added tests (per Peter request).
> >
> > I wrote a test to check psql query cancel support.  I checked that it
> fails
> > against the patch that was reverted.  Maybe this is useful.
>
> Here is the updated version (v8? I'm not sure what the right count is),
> which works for me and for "make check", including some tests added for
> uncovered paths.
>
> I included your tap test (thanks again!) with some more comments and
> cleanup.
>
> I tested manually for the pager feature, which mostly work, althoug
> "pspg --stream" does not seem to expect two tables, or maybe there is a
> way to switch between these that I have not found.
>

pspg doesn't support this feature. Theoretically it can be implementable (I
am able to hold two datasets now), but without any help with
synchronization I don't want to implement any more complex parsing. On the
pspg side I am not able to detect what is the first result in the batch,
what is the last result (without some hard heuristics - probably I can read
some information from timestamps). And if you need two or more results in
one terminal, then mode without pager is better.



> --
> Fabien.


Re: Numeric x^y for negative x

2021-07-22 Thread Dean Rasheed
On Thu, 22 Jul 2021 at 06:13, Yugo NAGATA  wrote:
>
> Thank you for updating the patch. I am fine with the additional comments.
> I don't think there is any other problem left, so I marked it 
> Ready-for-Committers.
>

Thanks for looking. Barring any further comments, I'll push this in a few days.

Regards,
Dean




Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 16:49 odesílatel Fabien COELHO 
napsal:

>
> Hello,
>
> > Minimally for PSQL_WATCH_PAGER, the pager should exit after some time,
> but
> > before it has to repeat data reading. Elsewhere the psql will hang.
>
> Sure. The "pager.pl" script I sent exits after reading a few lines.
>
> > can be solution to use special mode for psql, when psql will do write to
> > logfile and redirect to file instead using any (simplified) pager?
>
> I do not want a special psql mode, I just would like "make check" to tell
> me if I broke the PSQL_WATCH_PAGER feature after reworking the
> multi-results patch.
>
> > Theoretically, there is nothing special on usage of pager, and just you
> can
> > test redirecting to file.
>
> I do not follow. For what I seen the watch pager feature is somehow a
> little different, and I'd like to be sure I'm not breaking anything.
>
> For your information, pspg does not seem to like being fed two results
>
>sh> PSQL_WATCH_PAGER="pspg --stream"
>psql> SELECT NOW() \; SELECT RANDOM() \watch 1
>
> The first table is shown, the second seems ignored.
>

pspg cannot show multitable results, so it is not surprising.  And I don't
think about supporting this. Unfortunately I am not able to detect this
situation and show some warnings, just because psql doesn't send enough
data for it. Can be nice if psql sends some invisible characters, that
allows synchronization. But there is nothing. I just detect the timestamp
line and empty lines.



> --
> Fabien.
>


Re: Rename of triggers for partitioned tables

2021-07-22 Thread Arne Roland

Hi,

looking at the patch, I realized the renametrig_partition could  use an index 
leading with tgparentid, without the need to traverse the child tables. Since 
we still need to lock them, there is likely no practical performance gain. But 
I am surprised there is no unique index on (tgparentid, tgrelid), which sounds 
like a decent sanity check to have anyways.


From: Alvaro Herrera 
Sent: Thursday, July 22, 2021 00:16
To: Arne Roland
Cc: vignesh C; Zhihong Yu; Pg Hackers
Subject: Re: Rename of triggers for partitioned tables

... now, thinking about how does pg_dump deal with this, I think this
thread is all wrong, or at least mostly wrong.  We cannot have triggers
in partitions with different names from the triggers in the parents.
pg_dump would only dump the trigger in the parent and totally ignore the
ones in children if they have different names.

That makes things simpler. The reasoning behind the ONLY variant was not to 
break things that worked in the past. If it never worked in the past, there is 
no reason to offer backwards compatibility.

1. if we detect that we're running in a trigger which has tgparentid
set, we have to raise an error,
2. if we are running on a trigger in a partitioned table, then ONLY must
not be specified.

I think that would make the whole idea of allowing ONLY obsolete altogether 
anyways. I'd vote to simply scratch that syntax. I will work on updating your 
version 9 of this patch accordingly.

I think we should do something about pg_upgrade. The inconsistency is obviously 
broken. I'd like it to make sure every partitioned trigger is named correctly, 
simply taking the name of the parent. Simplifying the state our database can 
have, might help us to avoid future headaches.

Regards
Arne



Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Fabien COELHO


Ok. I noticed. The patch got significantly broken by the watch pager 
commit. I also have to enhance the added tests (per Peter request).


I wrote a test to check psql query cancel support.  I checked that it fails 
against the patch that was reverted.  Maybe this is useful.


Here is the updated version (v8? I'm not sure what the right count is), 
which works for me and for "make check", including some tests added for 
uncovered paths.


I included your tap test (thanks again!) with some more comments and 
cleanup.


I tested manually for the pager feature, which mostly work, althoug 
"pspg --stream" does not seem to expect two tables, or maybe there is a 
way to switch between these that I have not found.


--
Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 40b5109b55..ccce72fb85 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -50,8 +50,28 @@ BEGIN \;
 SELECT 2.0 AS "float" \;
 SELECT 'world' AS "text" \;
 COMMIT;
+ float 
+---
+   2.0
+(1 row)
+
+ text  
+---
+ world
+(1 row)
+
 -- compound with empty statements and spurious leading spacing
 \;\;   SELECT 3 + 3 \;\;\;   SELECT ' ' || ' !' \;\;   SELECT 1 + 4 \;;
+ ?column? 
+--
+6
+(1 row)
+
+ ?column? 
+--
+   !
+(1 row)
+
  ?column? 
 --
 5
@@ -61,6 +81,11 @@ COMMIT;
 SELECT 1 + 1 + 1 AS "add" \gset
 SELECT :add + 1 + 1 AS "add" \;
 SELECT :add + 1 + 1 AS "add" \gset
+ add 
+-
+   5
+(1 row)
+
 -- set operator
 SELECT 1 AS i UNION SELECT 2 ORDER BY i;
  i 
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index fcab5c0d51..7c5504bc74 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql
commands included in the string to divide it into multiple
transactions.  (See 
for more details about how the server handles multi-query strings.)
-   Also, psql only prints the
-   result of the last SQL command in the string.
-   This is different from the behavior when the same string is read from
-   a file or fed to psql's standard input,
-   because then psql sends
-   each SQL command separately.
   
   
-   Because of this behavior, putting more than one SQL command in a
-   single -c string often has unexpected results.
-   It's better to use repeated -c commands or feed
-   multiple commands to psql's standard input,
+   If having several commands executed in one transaction is not desired, 
+   use repeated -c commands or feed multiple commands to
+   psql's standard input,
either using echo as illustrated above, or
via a shell here-document, for example:
 
@@ -3542,10 +3535,6 @@ select 1\; select 2\; select 3;
 commands included in the string to divide it into multiple
 transactions.  (See 
 for more details about how the server handles multi-query strings.)
-psql prints only the last query result
-it receives for each request; in this example, although all
-three SELECTs are indeed executed, psql
-only prints the 3.
 
 
   
@@ -4132,6 +4121,18 @@ bar
   
 
   
+SHOW_ALL_RESULTS
+
+
+When this variable is set to off, only the last
+result of a combined query (\;) is shown instead of
+all of them.  The default is on.  The off behavior
+is for compatibility with older versions of psql.
+
+
+  
+
+   
 SHOW_CONTEXT
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 5640786678..564a4816de 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -33,6 +33,8 @@ static bool DescribeQuery(const char *query, double *elapsed_msec);
 static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec);
 static bool command_no_begin(const char *query);
 static bool is_select_command(const char *query);
+static int SendQueryAndProcessResults(const char *query, double *pelapsed_msec,
+	bool is_watch, const printQueryOpt *opt, FILE *printQueryFout);
 
 
 /*
@@ -353,7 +355,7 @@ CheckConnection(void)
  * Returns true for valid result, false for error state.
  */
 static bool
-AcceptResult(const PGresult *result)
+AcceptResult(const PGresult *result, bool show_error)
 {
 	bool		OK;
 
@@ -384,7 +386,7 @@ AcceptResult(const PGresult *result)
 break;
 		}
 
-	if (!OK)
+	if (!OK && show_error)
 	{
 		const char *error = PQerrorMessage(pset.db);
 
@@ -472,6 +474,18 @@ ClearOrSaveResult(PGresult *result)
 	}
 }
 
+/*
+ * Consume all results
+ */
+static void
+ClearOrSaveAllResults()
+{
+	PGresult	*result;
+
+	while ((result = PQgetResult(pset.db)) != NULL)
+		ClearOrSaveResult(result);
+}
+
 
 /*
  * Print 

pg_amcheck: Fix block number parsing on command line

2021-07-22 Thread Peter Eisentraut


It seems to me that when using the pg_amcheck --startblock and 
--endblock options on platforms where sizeof(long) == 4, you cannot 
specify higher block numbers (unless you do tricks with negative 
numbers).  The attached patch should fix this by using strtoul() instead 
of strtol().  I also tightened up the number scanning a bit in other 
ways, similar to the code in other frontend utilities.  I know some 
people have been working on tightening all this up.  Please check that 
it's up to speed.
From d8bcc7cab6000acdaab69a81a57d7d6d41ae7bd5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 22 Jul 2021 16:49:37 +0200
Subject: [PATCH] pg_amcheck: Fix block number parsing on command line

The previous code wouldn't handle higher block numbers on systems
where sizeof(long) == 4.
---
 src/bin/pg_amcheck/pg_amcheck.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_amcheck/pg_amcheck.c b/src/bin/pg_amcheck/pg_amcheck.c
index 4bde16fb4b..d18d38010d 100644
--- a/src/bin/pg_amcheck/pg_amcheck.c
+++ b/src/bin/pg_amcheck/pg_amcheck.c
@@ -297,6 +297,7 @@ main(int argc, char *argv[])
long_options, 
)) != -1)
{
char   *endptr;
+   unsigned long optval;
 
switch (c)
{
@@ -407,30 +408,34 @@ main(int argc, char *argv[])
}
break;
case 7:
-   opts.startblock = strtol(optarg, , 10);
-   if (*endptr != '\0')
+   errno = 0;
+   optval = strtoul(optarg, , 10);
+   if (endptr == optarg || *endptr != '\0' || 
errno != 0)
{
pg_log_error("invalid start block");
exit(1);
}
-   if (opts.startblock > MaxBlockNumber || 
opts.startblock < 0)
+   if (optval > MaxBlockNumber)
{
pg_log_error("start block out of 
bounds");
exit(1);
}
+   opts.startblock = optval;
break;
case 8:
-   opts.endblock = strtol(optarg, , 10);
-   if (*endptr != '\0')
+   errno = 0;
+   optval = strtoul(optarg, , 10);
+   if (endptr == optarg || *endptr != '\0' || 
errno != 0)
{
pg_log_error("invalid end block");
exit(1);
}
-   if (opts.endblock > MaxBlockNumber || 
opts.endblock < 0)
+   if (optval > MaxBlockNumber)
{
pg_log_error("end block out of bounds");
exit(1);
}
+   opts.endblock = optval;
break;
case 9:
opts.rootdescend = true;
-- 
2.32.0



Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Fabien COELHO



Hello,


Minimally for PSQL_WATCH_PAGER, the pager should exit after some time, but
before it has to repeat data reading. Elsewhere the psql will hang.


Sure. The "pager.pl" script I sent exits after reading a few lines.


can be solution to use special mode for psql, when psql will do write to
logfile and redirect to file instead using any (simplified) pager?


I do not want a special psql mode, I just would like "make check" to tell 
me if I broke the PSQL_WATCH_PAGER feature after reworking the 
multi-results patch.



Theoretically, there is nothing special on usage of pager, and just you can
test redirecting to file.


I do not follow. For what I seen the watch pager feature is somehow a 
little different, and I'd like to be sure I'm not breaking anything.


For your information, pspg does not seem to like being fed two results

  sh> PSQL_WATCH_PAGER="pspg --stream"
  psql> SELECT NOW() \; SELECT RANDOM() \watch 1

The first table is shown, the second seems ignored.

--
Fabien.




Re: row filtering for logical replication

2021-07-22 Thread Dilip Kumar
On Thu, Jul 22, 2021 at 5:15 PM Amit Kapila  wrote:
>
> On Tue, Jul 20, 2021 at 4:33 PM Dilip Kumar  wrote:
> >
> > On Tue, Jul 20, 2021 at 3:43 PM Tomas Vondra
> >  wrote:
> > >
> > > Do we log the TOAST-ed values that were not updated?
> >
> > No, we don't, I have submitted a patch sometime back to fix that [1]
> >
>
> That patch seems to log WAL for key unchanged columns. What about if
> unchanged non-key columns? Do they get logged as part of the new tuple
> or is there some other way we can get those? If not, then we need to
> probably think of restricting filter clause in some way.

But what sort of restrictions? I mean we can not put based on data
type right that will be too restrictive, other option is only to allow
replica identity keys columns in the filter condition?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [BUG]Update Toast data failure in logical replication

2021-07-22 Thread Dilip Kumar
On Thu, Jul 22, 2021 at 4:11 PM Amit Kapila  wrote:
>
> On Thu, Jun 3, 2021 at 5:15 PM Dilip Kumar  wrote:
> >
> > On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh  
> > > wrote:
> > > >
> > > > On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar  
> > > > wrote:
> > > > >
> > > > > Yes, you are right.  I will change it in the next version, along with
> > > > > the test case.
> > > > >
> > > > +/*
> > > > + * if the key hasn't changed and we're only logging the key, we're 
> > > > done.
> > > > + * But if tuple has external data then we might have to detoast 
> > > > the key.
> > > > + */
> > > > This doesn't really mention why we need to detoast the key even when
> > > > the key remains the same. I guess we can add some more details here.
> > >
> > > Noted, let's see what others have to say about fixing this, then I
> > > will fix this along with one other pending comment and I will also add
> > > the test case.  Thanks for looking into this.
> >
> > I have fixed all the pending issues, I see there is already a test
> > case for this so I have changed the output for that.
> >
>
> IIUC, this issue occurs because we don't log the actual key value for
> unchanged toast key. It is neither logged as part of old_key_tuple nor
> for new tuple due to which we are not able to find it in the
> apply-side when we searched it via FindReplTupleInLocalRel. Now, I
> think it will work if we LOG the entire unchanged toasted value as you
> have done in the patch but can we explore some other way to fix it. In
> the subscriber-side, can we detect that the key column has toasted
> value in the new tuple and try to first fetch it and then do the index
> search for the fetched toasted value? I am not sure about the
> feasibility of this but wanted to see if we can someway avoid logging
> unchanged toasted key value as that might save us from additional WAL.

Yeah if we can do this then it will be a better approach, I think as
you mentioned we can avoid logging unchanged toast key data.  I will
investigate this next week and update the thread.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Outdated comment in get_agg_clause_costs

2021-07-22 Thread David Rowley
I noticed that get_agg_clause_costs still claims that it recursively
finds Aggrefs in the expression tree, but I don't think that's been
true since 0a2bc5d61.

I've attached a patch that adjusts the comment so it's more aligned to
what it now does.

David


fix_outdated_comment_in_get_agg_clause_costs.patch
Description: Binary data


Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

2021-07-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 14.07.21 18:26, Tom Lane wrote:
>> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
>> case 's':
>> a = arg.p ? arg.p : "(null)";

> Similar here: 
> https://github.com/ensc/dietlibc/blob/master/lib/__v_printf.c#L188

I also took a look at μClibc, as well as glibc itself, and learned some
additional facts.  glibc's behavior is not just 'print "(null)" instead'.
It is 'print "(null)" if the field width allows at least six characters,
otherwise print nothing'.  μClibc is bug-compatible with this, but other
implementations seem to generally just substitute "(null)" for the input
string and run with that.  I'm inclined to side with the latter camp.
I'd rather see something like "(nu" than empty because the latter looks
too much like it might be correct output; so I think glibc is expending
extra code to produce a less-good result.

> I think unless we get counterexamples, this is all good.

Barring objections, I'll press ahead with changing snprintf.c to do this.

regards, tom lane




Followup Timestamp to timestamp with TZ conversion

2021-07-22 Thread Peter Volk
Hi,

this is a followup to a performance optimization during the conversion of a
column from a timestamp column to a "timestamp with tz" column. The initial
patch I am referring to is this one:

https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=3c59263#patch4

and the previous discussion on this list is this one:

https://www.postgresql.org/message-id/cansfx06xn-vpyxm+yxyflezk9twjtk3dfjcohoubtxng40m...@mail.gmail.com

The problem is that I have a 60TB+ PG installation for which we need to
modify all of the timestamp columns to timestamp with tz. The data in the
columns are already in UTC so we can benefit from the patch listed above.
Yet there are 2 cases in which we are having an issue.

1) Index rebuilds: The patch is only avoiding a rewrite of the table data
but is not avoiding a rebuild of the indexes. Following the logic in the
patch above this should also be avoidable under the same condition

2) Partitioned tables with the timestamp as partition column: In this case
the current version does not allow a modification of the column data type
at all. Yet also following the logic in the patch this can also be allowed
under the side condition if no table rewrite is required.

Question: What chances to we have to get the optimisations from the patch
above also "promoted" to the other 2 cases I listed?

Cheers,
Peter


Re: Transactions and indexes

2021-07-22 Thread Chris Cleveland
>
> Are you implementing a new index AM or a new table AM? Discarding data
> based on something like a relevance score doesn't seem like something
> that either API provides for. Indexes in Postgres can be lossy, but
> that in itself doesn't change the result of queries.
>

(Sorry if this doesn't quote properly, I'm trying to figure out how to do
the quote-and-bottom-post thing in gmail).

My plan was to do an index AM alone, but I'm thinking that isn't going to
work. The goal is to do better full-text search in Postgres, fast, over
really large datasets.

Relevance scoring is like an ORDER BY score with a LIMIT. The code that
traverses the index needs to know both of these things in advance.

The GIN code doesn't cut it. I'm still trying to understand the code for
the RUM index type, but it's slow going.

Suggestions on how to go about this are welcome.


Re: Transactions and indexes

2021-07-22 Thread Chris Cleveland
Thank you. Does this mean I can implement the index AM and return TIDs
without having to worry about transactions at all?

Also, as far as I can tell, the only way that TIDs are removed from the
index is in ambulkdelete(). Is this accurate? Does that mean that my index
will be returning TIDs for deleted items and I don't have to worry about
that either?

Don't TIDs get reused? What happens when my index returns an old TID which
is now pointing to a new record?

This is going to make it really hard to implement Top X queries of the type
you get from a search engine. A search engine will normally maintain an
internal buffer (usually a priority queue) of a fixed size, X, and add
tuples to it along with their relevance score. The buffer only remembers
the Top X tuples with the highest score. In this way the search engine can
iterate over millions of entries and retain only the best ones without
having an unbounded buffer. For this to work, though, you need to know how
many tuples to keep in the buffer in advance. If my index can't know, in
advance, which TIDs are invisible or deleted, then it can't keep them out
of the buffer, and this whole scheme fails.

This is not going to work unless the system gives the index a clear picture
of transactions, visibility, and deletes as they happen. Is this
information available?




On Mon, Jul 19, 2021 at 6:58 PM Peter Geoghegan  wrote:

> On Mon, Jul 19, 2021 at 4:31 PM Chris Cleveland
>  wrote:
> > I'm confused on how to handle transactions and visibility.
>
> In Postgres, indexes are not considered to be part of the logical
> database. They're just data structures that point to TIDs in the
> table. To an index, each TID is just another object -- it doesn't
> possess any built-in idea about MVCC.
>
> In practice the indexes may be able to surmise certain things about
> MVCC and versioning, as an optimization -- but that is all speculative
> and relies on cooperation from the table AM side. Also, the
> implementation of unique indexes knows more than zero about versions,
> since that's just necessary. These two cases may or may not be
> considered exceptions to the general rule. I suppose that it's a
> matter of perspective.
>
> > So... how do I handle this? Is there some way for me to implement my own
> storage manager that manages visibility?
>
> This is the responsibility of a table AM, not any one index AM. In
> general we assume that each table AM implements something very much
> like heapam's VACUUM implementation. Index AMs may also have
> opportunistic cleanup of their own, as an optimization (actually this
> is what I was referring to).
>
> Theoretically index AMs and table AMs are orthogonal things. How true
> that will be in a world with more than one mature table AM remains to
> be seen.
>
> --
> Peter Geoghegan
>


-- 
Chris Cleveland
312-339-2677 mobile


Re: BUG #15293: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events

2021-07-22 Thread Davide Fasolo
Should Arthur patch be included in PostgreSQL 14 - Beta 4?


Re: Use quick select instead of qsort to get median

2021-07-22 Thread John Naylor
On Thu, Jul 22, 2021 at 8:07 AM houzj.f...@fujitsu.com <
houzj.f...@fujitsu.com> wrote:
>
> Hi,
>
> When I was writing an extension which need to get the median of an array,
I
> tried to find if postgres provide some api that can do that. I found all
the
> places in postgres invoke qsort() and then get the median. I was thinking
can
> we do better by using "quick select" and is it worth it.

> Attach a POC patch about this idea. I did some simple performance tests,
I can
> see about 10% performance gain in this test[2].
>
> Thoughts ?

> 1.
> entry_dealloc
> ...
> /* Record the (approximate) median usage */
> if (i > 0)
> pgss->cur_median_usage = entries[i / 2]->counters.usage;

It might be useful to be more precise here, but it seems it would be
slower, too?

> -test create index
> drop index quad_box_tbl_idx;
> CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);
>
> ---test results
> PATCH:
> Time: 2609.664 ms (00:02.610)
>
> HEAD:
> Time: 2903.765 ms (00:02.944)

That index type is pretty rare, as far as I know. That doesn't seem to be
quite enough motivation to change the qsort template. If the goal was to
improve the speed of "create spgist index", would this still be the best
approach? Also, there are other things under consideration that would add
complexity to the qsort template [1], and this would add even more.

Looking in the docs [2], we don't have a MEDIAN aggregate, but we do have
percentile_disc(), and quick select might help there, but I haven't looked.

[1] https://commitfest.postgresql.org/33/3038/
[2] https://www.postgresql.org/docs/14/functions-aggregate.html
--
John Naylor
EDB: http://www.enterprisedb.com


Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-21, Michael Paquier wrote:

> +/*
> + * option_parse_int
> + *
> + * Parse an integer for a given option.  Returns true if the parsing
> + * could be done with optionally *result holding the parsed value, and
> + * false on failure.
> + */

May I suggest for the second sentence something like "If the parsing is
successful, returns true and stores the result in *result if that's
given; if parsing fails, returns false"

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)




Re: window build doesn't apply PG_CPPFLAGS correctly

2021-07-22 Thread Alvaro Herrera
On 2021-Jul-22, Pavel Stehule wrote:

> čt 22. 7. 2021 v 14:04 odesílatel Andrew Dunstan 
> napsal:

> > Almost everything in the Makefiles is not used by the MSVC buid system.
> > Using this one seems likely to be quite difficult, since the syntax for
> > the MSVC compiler command line is very different, and furthermore the
> > MSVC build system doesn't know anything about how to use this setting.
> >
> > AFAICT PG_CPPFLAGS is only used by pgxs.
> >
> > You would need to tell us more about how your build process is working.
> 
> I need access to plpgsql.h in build time. This is only one dependency. When
> I build an extension, then plpgsql.h is in a shared directory. But when I
> build a module for a test, the header files are not installed yet. For
> build it requires an include dir -I$(top_srcdir)/src/pl/plpgsql/src

But Project.pm parses Makefiles and puts stuff into the MSVC buildsystem
file format; note David Rowley's patch that (among other things) removes
a bunch of ->AddIncludeDir calls by parsing PG_CPPFLAGS
https://postgr.es/m/CAApHDvpXoav0aZnsji-ZNdo=9txqawnwmsh44gyn8k7i2pr...@mail.gmail.com
which is probably apropos.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."   (Tom Allison)
   http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php




Re: Detecting File Damage & Inconsistencies

2021-07-22 Thread Simon Riggs
On Wed, 14 Jul 2021 at 05:01, Amit Kapila  wrote:

> The patch is simple but its use doesn't seem to be very clear. You
> have mentioned its use for future PITR patches and Craig mentioned
> some use cases in logical decoding and it appears to me that to
> support the use cases mentioned by Craig, it is important to LOG this
> earlier than at commit time. As there are no details about how it will
> be used for PITR patches and whether such patch ideas are accepted, it
> makes it harder to judge the value of this patch.

> I think if we would have patches (even at WIP/POC stage) for the ideas
> you and Craig have in mind, it would have been much easier to see the
> value of this patch.

Fair question. This is one of a series of 4 independent patches I have
planned to provide enhanced information and/or recovery tools. (The
second one is already in this CF). This is an area I know lots about
and nobody else is working on, so I thought I would contribute. I've
not discussed this off-list with anyone else. So this is PITR as
recovery in a broad sense, not just that specific feature.

For this patch, the idea is to be able to go in either direction:
Data damage <--> User info

So if you know a user was an intruder, you can detect the damage they caused.
Or, if you detect damage, you can work out who caused it, work out if
they were an intruder and if so, detect what else they did.

The most important thing is to have the info available in WAL, nothing
is possible until that is available.
We already added an option to add this same info to log_line_prefix,
yet nobody said it wasn't useful there, or needed other uses to allow
the feature.
The two sources of info are designed to be able to be used in combination.

My experience of recovery scenarios is that you often have to build
custom search tools to make it work. It's hard to say whether you'll
want to track the user, the specific session, or even specific
transactions.

But I do understand the overall request, so I propose adding
* pg_waldump output for wal_sessioninfo data, if it exists
* pg_waldump --user=USERNAME as a filter on username
to demonstrate the use of this

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: window build doesn't apply PG_CPPFLAGS correctly

2021-07-22 Thread Andrew Dunstan


On 7/22/21 8:11 AM, Pavel Stehule wrote:
>
>
> čt 22. 7. 2021 v 14:04 odesílatel Andrew Dunstan  > napsal:
>
>
> On 7/22/21 12:06 AM, Pavel Stehule wrote:
> > Hi
> >
> > I tried to write test for plpgsql debug API, where I need to
> access to
> > plpgsql.h
> >
> > I have line
> >
> > PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpgsql/src
> >
> > that is working well on unix, but it do nothing on windows
> >
> > [00:05:14] Project "C:\projects\postgresql\pgsql.sln" (1) is
> building
> > "C:\projects\postgresql\test_dbgapi.vcxproj" (87) on node 1 (default
> > targets).
> > [00:05:14] PrepareForBuild:
> > [00:05:14]   Creating directory ".\Release\test_dbgapi\".
> > [00:05:14]   Creating directory
> ".\Release\test_dbgapi\test_dbgapi.tlog\".
> > [00:05:14] InitializeBuildStatus:
> > [00:05:14]   Creating
> > ".\Release\test_dbgapi\test_dbgapi.tlog\unsuccessfulbuild" because
> > "AlwaysCreate" was specified.
> > [00:05:14] ClCompile:
> > [00:05:14]   C:\Program Files (x86)\Microsoft Visual Studio
> > 12.0\VC\bin\x86_amd64\CL.exe /c /Isrc/include
> /Isrc/include/port/win32
> > /Isrc/include/port/win32_msvc /Zi /nologo /W3 /WX- /Ox /D WIN32 /D
> > _WINDOWS /D __WINDOWS__ /D __WIN32__ /D
> WIN32_STACK_RLIMIT=4194304 /D
> > _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _WINDLL /D
> > _MBCS /GF /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope
> > /Fo".\Release\test_dbgapi\\"
> /Fd".\Release\test_dbgapi\vc120.pdb" /Gd
> > /TC /wd4018 /wd4244 /wd4273 /wd4102 /wd4090 /wd4267
> /errorReport:queue
> > /MP src/test/modules/test_dbgapi/test_dbgapi.c
> > [00:05:14]   test_dbgapi.c
> > [00:05:16] src/test/modules/test_dbgapi/test_dbgapi.c(17): fatal
> error
> > C1083: Cannot open include file: 'plpgsql.h': No such file or
> > directory [C:\projects\postgresql\test_dbgapi.vcxproj]
> > [00:05:16] Done Building Project
> > "C:\projects\postgresql\test_dbgapi.vcxproj" (default targets)
> -- FAILED.
> > [00:05:16] Project "C:\projects\postgresql\pgsql.sln" (1) is
> building
> > "C:\projects\postgresql\test_ddl_deparse.vcxproj" (88) on node 1
> > (default targets).
> >
> > looks so PG_CPPFLAGS is not propagated to CPPFLAGS there.
> >
> >
>
> Almost everything in the Makefiles is not used by the MSVC buid
> system.
> Using this one seems likely to be quite difficult, since the
> syntax for
> the MSVC compiler command line is very different, and furthermore the
> MSVC build system doesn't know anything about how to use this setting.
>
> AFAICT PG_CPPFLAGS is only used by pgxs.
>
> You would need to tell us more about how your build process is
> working.
>
>
> I need access to plpgsql.h in build time. This is only one dependency.
> When I build an extension, then plpgsql.h is in a shared directory.
> But when I build a module for a test, the header files are not
> installed yet. For build it requires an include dir
> -I$(top_srcdir)/src/pl/plpgsql/src
>
>

If I understand correctly what you're doing, you probably need to add an
entry for your module to $contrib_extraincludes in
src/tools/msvc/Mkvcbuild.pm, e.g.


my $contrib_extraincludes = { 'dblink' => ['src/backend'] ,

                              'test_dbgapi'    => [ 'src/pl/plpgsql/src'
] };


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: logical replication empty transactions

2021-07-22 Thread Ajin Cherian
On Thu, Jul 22, 2021 at 6:11 PM Peter Smith  wrote:
>
> Hi Ajin.
>
> I have reviewed the v8 patch and my feedback comments are below:
>
> //
>
> 1. Apply v8 gave multiple whitespace warnings.
>
> --
>
> 2. Commit comment - wording
>
> If (when processing a COMMIT / PREPARE message) we find there had been
> no other change for that transaction, then do not send the COMMIT /
> PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
> or BEGIN PREPARE / PREPARE  messages for transactions that are empty.
>
> =>
>
> Shouldn't this also mention some other messages that may be skipped?
> - COMMIT PREPARED
> - ROLLBACK PREPARED
>

Updated.

> --
>
> 3. doc/src/sgml/logicaldecoding.sgml - wording
>
> @@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct
> LogicalDecodingContext *ctx,
>The required commit_prepared_cb callback is called
>whenever a transaction COMMIT PREPARED has
> been decoded.
>The gid field, which is part of the
> -  txn parameter, can be used in this callback.
> +  txn parameter, can be used in this callback. The
> +  parameters prepare_end_lsn and
> +  prepare_time can be used to check if the plugin
> +  has received this PREPARE TRANSACTION command or 
> not.
> +  If yes, it can commit the transaction, otherwise, it can skip the 
> commit.
> +  The gid alone is not sufficient to determine 
> this
> +  because the downstream may already have a prepared transaction with the
> +  same identifier.
>
> =>
>
> Typo: Should that say "downstream node" instead of just "downstream" ?
>
> --

Updated.

>
> 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn
> callback comment
>
> @@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx,
> OutputPluginOptions *opt,
>
>  /*
>   * BEGIN callback
> + * Don't send BEGIN message here. Instead, postpone it until the first
> + * change. In logical replication, a common scenario is to replicate a set
> + * of tables (instead of all tables) and transactions whose changes were on
>
> =>
>
> Typo: "BEGIN callback" --> "BEGIN callback." (with the period).
>
> And, I think maybe it will be better if it has a separating blank line too.
>
> e.g.
>
> /*
>  * BEGIN callback.
>  *
>  * Don't send BEGIN 
>
> (NOTE: this review comment applies to other callback function comments
> too, so please hunt them all down)
>
> --

Updated.

>
> 5. src/backend/replication/pgoutput/pgoutput.c - data / txndata
>
>  static void
>  pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
>  {
> + PGOutputTxnData*data = MemoryContextAllocZero(ctx->context,
> + sizeof(PGOutputTxnData));
>
> =>
>
> There is some inconsistent naming of the local variable in the patch.
> Sometimes it is called "data"; Sometimes it is called "txdata" etc. It
> would be better to just stick with the same variable name everywhere.
>
> (NOTE: this comment applies to several places in this patch)
>
> --

I've changed all occurance of PGOutputTxnData to txndata. Note that
there is another structure PGOutputData which still uses the name
data.

>
> 6. src/backend/replication/pgoutput/pgoutput.c - Strange way to use Assert
>
> + /* If not streaming, should have setup txndata as part of
> BEGIN/BEGIN PREPARE */
> + if (!in_streaming)
> + Assert(txndata);
> +
>
> =>
>
> This style of Assert code seemed strange to me. In production mode
> isn't that going to evaluate to some condition with a ((void) true)
> body? IMO it might be better to just include the streaming check as
> part of the Assert. For example:
>
> BEFORE
> if (!in_streaming)
> Assert(txndata);
>
> AFTER
> Assert(in_streaming || txndata);
>
> (NOTE: This same review comment applies in at least 3 places in this
> patch, so please hunt them all down)
>

Updated.

> --
>
> 7. src/backend/replication/pgoutput/pgoutput.c - comment wording
>
> @@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
>   Assert(false);
>   }
>
> + /*
> + * output BEGIN / BEGIN PREPARE if we haven't yet,
> + * while streaming no need to send BEGIN / BEGIN PREPARE.
> + */
> + if (!in_streaming && !txndata->sent_begin_txn)
>
> =>
>
> English not really that comment is. The comment should also start with
> uppercase.
>
> (NOTE: This same comment was in couple of places in the patch)
>

Updated.

regards,
Ajin Cherian
Fujitsu Australia


v9-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data


Re: proposal: enhancing plpgsql debug API - returns text value of variable content

2021-07-22 Thread Aleksander Alekseev
Hi Pavel,

>> I am sending an enhanced patch about the regress test for plpgsql's
debug API.

Thanks for the test! I noticed some little issues with formatting and
typos. The corrected patch is attached.

> override CPPFLAGS := $(CPPFLAGS) -I$(top_srcdir)/src/pl/plpgsql/src

You probably already noticed, but for the record - AppVeyor doesn't seem to
be happy still [1]:

```
src/test/modules/test_dbgapi/test_dbgapi.c(17): fatal error C1083: Cannot
open include file: 'plpgsql.h': No such file or directory
[C:\projects\postgresql\test_dbgapi.vcxproj]
```

[1]:
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.141500

-- 
Best regards,
Aleksander Alekseev


v7-0001-enhancing-plpgsql-dbgapi.patch
Description: Binary data


Re: window build doesn't apply PG_CPPFLAGS correctly

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 14:04 odesílatel Andrew Dunstan 
napsal:

>
> On 7/22/21 12:06 AM, Pavel Stehule wrote:
> > Hi
> >
> > I tried to write test for plpgsql debug API, where I need to access to
> > plpgsql.h
> >
> > I have line
> >
> > PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpgsql/src
> >
> > that is working well on unix, but it do nothing on windows
> >
> > [00:05:14] Project "C:\projects\postgresql\pgsql.sln" (1) is building
> > "C:\projects\postgresql\test_dbgapi.vcxproj" (87) on node 1 (default
> > targets).
> > [00:05:14] PrepareForBuild:
> > [00:05:14]   Creating directory ".\Release\test_dbgapi\".
> > [00:05:14]   Creating directory
> ".\Release\test_dbgapi\test_dbgapi.tlog\".
> > [00:05:14] InitializeBuildStatus:
> > [00:05:14]   Creating
> > ".\Release\test_dbgapi\test_dbgapi.tlog\unsuccessfulbuild" because
> > "AlwaysCreate" was specified.
> > [00:05:14] ClCompile:
> > [00:05:14]   C:\Program Files (x86)\Microsoft Visual Studio
> > 12.0\VC\bin\x86_amd64\CL.exe /c /Isrc/include /Isrc/include/port/win32
> > /Isrc/include/port/win32_msvc /Zi /nologo /W3 /WX- /Ox /D WIN32 /D
> > _WINDOWS /D __WINDOWS__ /D __WIN32__ /D WIN32_STACK_RLIMIT=4194304 /D
> > _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _WINDLL /D
> > _MBCS /GF /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope
> > /Fo".\Release\test_dbgapi\\" /Fd".\Release\test_dbgapi\vc120.pdb" /Gd
> > /TC /wd4018 /wd4244 /wd4273 /wd4102 /wd4090 /wd4267 /errorReport:queue
> > /MP src/test/modules/test_dbgapi/test_dbgapi.c
> > [00:05:14]   test_dbgapi.c
> > [00:05:16] src/test/modules/test_dbgapi/test_dbgapi.c(17): fatal error
> > C1083: Cannot open include file: 'plpgsql.h': No such file or
> > directory [C:\projects\postgresql\test_dbgapi.vcxproj]
> > [00:05:16] Done Building Project
> > "C:\projects\postgresql\test_dbgapi.vcxproj" (default targets) -- FAILED.
> > [00:05:16] Project "C:\projects\postgresql\pgsql.sln" (1) is building
> > "C:\projects\postgresql\test_ddl_deparse.vcxproj" (88) on node 1
> > (default targets).
> >
> > looks so PG_CPPFLAGS is not propagated to CPPFLAGS there.
> >
> >
>
> Almost everything in the Makefiles is not used by the MSVC buid system.
> Using this one seems likely to be quite difficult, since the syntax for
> the MSVC compiler command line is very different, and furthermore the
> MSVC build system doesn't know anything about how to use this setting.
>
> AFAICT PG_CPPFLAGS is only used by pgxs.
>
> You would need to tell us more about how your build process is working.
>

I need access to plpgsql.h in build time. This is only one dependency. When
I build an extension, then plpgsql.h is in a shared directory. But when I
build a module for a test, the header files are not installed yet. For
build it requires an include dir -I$(top_srcdir)/src/pl/plpgsql/src

Regards

Pavel



>
> cheers
>
>
> andrew
>
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Use quick select instead of qsort to get median

2021-07-22 Thread houzj.f...@fujitsu.com
Hi,

When I was writing an extension which need to get the median of an array, I
tried to find if postgres provide some api that can do that. I found all the
places in postgres invoke qsort() and then get the median. I was thinking can
we do better by using "quick select" and is it worth it.

Currently, there are some places[1] in the code that need the median and can
use "quick select" instead. And some of them(spg_box_quad_picksplit /
spg_range_quad_picksplit) are invoked frequently when INSERT or CREATE INDEX.
So, Peronally, It's acceptable to introduce a quick select api to improve these
places.

Since most of the logic of "quick select" is similar to quick sort, I think
we can reuse the code in sort_template.h. We only need to let the sort stop
when find the target top Kth element.

Attach a POC patch about this idea. I did some simple performance tests, I can
see about 10% performance gain in this test[2].

Thoughts ?

[1]
1.
entry_dealloc
...
/* Record the (approximate) median usage */
if (i > 0)
pgss->cur_median_usage = entries[i / 2]->counters.usage;
2.
spg_box_quad_picksplit
qsort(lowXs, in->nTuples, sizeof(float8), compareDoubles);
...
centroid->low.x = lowXs[median];

3.
spg_range_quad_picksplit
qsort(lowerBounds, nonEmptyCount, sizeof(RangeBound),
...
centroid = range_serialize(typcache, [median],

4.
spg_quad_picksplit
qsort(sorted, in->nTuples, sizeof(*sorted), x_cmp);
...
centroid->x = sorted[median]->x;



[2]
drop table quad_box_tbl;
CREATE unlogged TABLE quad_box_tbl (id int, b box);
truncate quad_box_tbl ;
explain (verbose, analyze)INSERT INTO quad_box_tbl
  SELECT (x - 1) * 10 + x, box(point(x * 10, x * 20), point(x * 10, x * 20 + 5))
  FROM generate_series(1, 100) x order by random();

-test create index
drop index quad_box_tbl_idx;
CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);

---test results
PATCH:
Time: 2609.664 ms (00:02.610)

HEAD:
Time: 2903.765 ms (00:02.944)

Best regards,
Houzj



0001-test-use-quick-select-to-get-median.patch
Description: 0001-test-use-quick-select-to-get-median.patch


Re: window build doesn't apply PG_CPPFLAGS correctly

2021-07-22 Thread Andrew Dunstan


On 7/22/21 12:06 AM, Pavel Stehule wrote:
> Hi
>
> I tried to write test for plpgsql debug API, where I need to access to
> plpgsql.h
>
> I have line
>
> PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpgsql/src
>
> that is working well on unix, but it do nothing on windows
>
> [00:05:14] Project "C:\projects\postgresql\pgsql.sln" (1) is building
> "C:\projects\postgresql\test_dbgapi.vcxproj" (87) on node 1 (default
> targets).
> [00:05:14] PrepareForBuild:
> [00:05:14]   Creating directory ".\Release\test_dbgapi\".
> [00:05:14]   Creating directory ".\Release\test_dbgapi\test_dbgapi.tlog\".
> [00:05:14] InitializeBuildStatus:
> [00:05:14]   Creating
> ".\Release\test_dbgapi\test_dbgapi.tlog\unsuccessfulbuild" because
> "AlwaysCreate" was specified.
> [00:05:14] ClCompile:
> [00:05:14]   C:\Program Files (x86)\Microsoft Visual Studio
> 12.0\VC\bin\x86_amd64\CL.exe /c /Isrc/include /Isrc/include/port/win32
> /Isrc/include/port/win32_msvc /Zi /nologo /W3 /WX- /Ox /D WIN32 /D
> _WINDOWS /D __WINDOWS__ /D __WIN32__ /D WIN32_STACK_RLIMIT=4194304 /D
> _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _WINDLL /D
> _MBCS /GF /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope
> /Fo".\Release\test_dbgapi\\" /Fd".\Release\test_dbgapi\vc120.pdb" /Gd
> /TC /wd4018 /wd4244 /wd4273 /wd4102 /wd4090 /wd4267 /errorReport:queue
> /MP src/test/modules/test_dbgapi/test_dbgapi.c
> [00:05:14]   test_dbgapi.c
> [00:05:16] src/test/modules/test_dbgapi/test_dbgapi.c(17): fatal error
> C1083: Cannot open include file: 'plpgsql.h': No such file or
> directory [C:\projects\postgresql\test_dbgapi.vcxproj]
> [00:05:16] Done Building Project
> "C:\projects\postgresql\test_dbgapi.vcxproj" (default targets) -- FAILED.
> [00:05:16] Project "C:\projects\postgresql\pgsql.sln" (1) is building
> "C:\projects\postgresql\test_ddl_deparse.vcxproj" (88) on node 1
> (default targets).
>
> looks so PG_CPPFLAGS is not propagated to CPPFLAGS there.
>
>

Almost everything in the Makefiles is not used by the MSVC buid system.
Using this one seems likely to be quite difficult, since the syntax for
the MSVC compiler command line is very different, and furthermore the
MSVC build system doesn't know anything about how to use this setting.

AFAICT PG_CPPFLAGS is only used by pgxs.

You would need to tell us more about how your build process is working.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





RE: Skipping logical replication transactions on subscriber side

2021-07-22 Thread houzj.f...@fujitsu.com
On July 20, 2021 9:26 AM Masahiko Sawada  wrote:
> On Mon, Jul 19, 2021 at 8:38 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On July 19, 2021 2:40 PM Masahiko Sawada 
> wrote:
> > > I've attached the updated version patch that incorporated all
> > > comments I got so far except for the clearing error details part I
> > > mentioned above. After getting a consensus on those parts, I'll
> > > incorporate the idea into the patches.
> >
> > 3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
> >assigned, and then will the change be skipped when the xid is assigned in
> >the future even if it doesn't cause any conflicts ?
> 
> Yes. Currently, setting a correct xid is the user's responsibility. I think 
> it would
> be better to disable it or emit WARNING/ERROR when the user mistakenly set
> the wrong xid if we find out a convenient way to detect that.

Thanks for the explanation. As Amit suggested, it seems we can document the
risk of misusing skip_xid. Besides, I found some minor things in the patch.

1) In 0002 patch

+ */
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+   if (subscriptionErrHash != NULL)
+   return;
+

+static void
+pgstat_recv_subscription_error(PgStat_MsgSubscriptionErr *msg, int len)
+{

the second paramater "len" seems not used in the function
pgstat_recv_subscription_purge() and pgstat_recv_subscription_error().


2) in 0003 patch

  * Helper function for apply_handle_commit and apply_handle_stream_commit.
  */
 static void
-apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
+apply_handle_commit_internal(LogicalRepCommitData *commit_data)
 {

This looks like a separate change which remove unused paramater in existing
code, maybe we can get this committed first ?

Best regards,
Houzj


Re: row filtering for logical replication

2021-07-22 Thread Amit Kapila
On Tue, Jul 20, 2021 at 4:33 PM Dilip Kumar  wrote:
>
> On Tue, Jul 20, 2021 at 3:43 PM Tomas Vondra
>  wrote:
> >
> > Do we log the TOAST-ed values that were not updated?
>
> No, we don't, I have submitted a patch sometime back to fix that [1]
>

That patch seems to log WAL for key unchanged columns. What about if
unchanged non-key columns? Do they get logged as part of the new tuple
or is there some other way we can get those? If not, then we need to
probably think of restricting filter clause in some way.

-- 
With Regards,
Amit Kapila.




Re: [POC] verifying UTF-8 using SIMD instructions

2021-07-22 Thread John Naylor
On Wed, Jul 21, 2021 at 8:08 PM Thomas Munro  wrote:
>
> On Thu, Jul 22, 2021 at 6:16 AM John Naylor

> One question is whether this "one size fits all" approach will be
> extensible to wider SIMD.

Sure, it'll just take a little more work and complexity. For one, 16-byte
SIMD can operate on 32-byte chunks with a bit of repetition:

-   __m128i input;
+   __m128i input1;
+   __m128i input2;

-#define SIMD_STRIDE_LENGTH (sizeof(__m128i))
+#define SIMD_STRIDE_LENGTH 32

while (len >= SIMD_STRIDE_LENGTH)
{
-   input = vload(s);
+   input1 = vload(s);
+   input2 = vload(s + sizeof(input1));

-   check_for_zeros(input, );
+   check_for_zeros(input1, );
+   check_for_zeros(input2, );

/*
 * If the chunk is all ASCII, we can skip the full UTF-8
check, but we
@@ -460,17 +463,18 @@ pg_validate_utf8_sse42(const unsigned char *s, int
len)
 * sequences at the end. We only update prev_incomplete if
the chunk
 * contains non-ASCII, since the error is cumulative.
 */
-   if (is_highbit_set(input))
+   if (is_highbit_set(bitwise_or(input1, input2)))
{
-   check_utf8_bytes(prev, input, );
-   prev_incomplete = is_incomplete(input);
+   check_utf8_bytes(prev, input1, );
+   check_utf8_bytes(input1, input2, );
+   prev_incomplete = is_incomplete(input2);
}
else
{
error = bitwise_or(error, prev_incomplete);
}

-   prev = input;
+   prev = input2;
s += SIMD_STRIDE_LENGTH;
len -= SIMD_STRIDE_LENGTH;
}

So with a few #ifdefs, we can accommodate two sizes if we like.

For another, the prevN() functions would need to change, at least on x86 --
that would require replacing _mm_alignr_epi8() with _mm256_alignr_epi8()
plus _mm256_permute2x128_si256(). Also, we might have to do something with
the vector typedef.

That said, I think we can punt on that until we have an application that's
much more compute-intensive. As it is with SSE4, COPY FROM WHERE  already pushes the utf8 validation way down in profiles.

> FWIW here are some performance results from my humble RPI4:
>
> master:
>
>  chinese | mixed | ascii
> -+---+---
> 4172 |  2763 |  1823
> (1 row)
>
> Your v15 patch:
>
>  chinese | mixed | ascii
> -+---+---
> 2267 |  1248 |   399
> (1 row)
>
> Your v15 patch set + the NEON patch, configured with USE_UTF8_SIMD=1:
>
>  chinese | mixed | ascii
> -+---+---
>  909 |   620 |   318
> (1 row)
>
> It's so good I wonder if it's producing incorrect results :-)

Nice! If it passes regression tests, it *should* be fine, but stress
testing would be welcome on any platform.

> I also tried to do a quick and dirty AltiVec patch to see if it could
> fit into the same code "shape", with less immediate success: it works
> out slower than the fallback code on the POWER7 machine I scrounged an
> account on.  I'm not sure what's wrong there, but maybe it's a uesful
> start (I'm probably confused about endianness, or the encoding of
> boolean vectors which may be different (is true 0x01or 0xff, does it
> matter?), or something else, and it's falling back on errors all the
> time?).

Hmm, I have access to a power8 machine to play with, but I also don't mind
having some type of server-class hardware that relies on the recent nifty
DFA fallback, which performs even better on powerpc64le than v15.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [BUG]Update Toast data failure in logical replication

2021-07-22 Thread Amit Kapila
On Thu, Jun 3, 2021 at 5:15 PM Dilip Kumar  wrote:
>
> On Wed, Jun 2, 2021 at 7:23 PM Dilip Kumar  wrote:
> >
> > On Wed, Jun 2, 2021 at 7:20 PM Kuntal Ghosh  
> > wrote:
> > >
> > > On Wed, Jun 2, 2021 at 3:10 PM Dilip Kumar  wrote:
> > > >
> > > > Yes, you are right.  I will change it in the next version, along with
> > > > the test case.
> > > >
> > > +/*
> > > + * if the key hasn't changed and we're only logging the key, we're 
> > > done.
> > > + * But if tuple has external data then we might have to detoast the 
> > > key.
> > > + */
> > > This doesn't really mention why we need to detoast the key even when
> > > the key remains the same. I guess we can add some more details here.
> >
> > Noted, let's see what others have to say about fixing this, then I
> > will fix this along with one other pending comment and I will also add
> > the test case.  Thanks for looking into this.
>
> I have fixed all the pending issues, I see there is already a test
> case for this so I have changed the output for that.
>

IIUC, this issue occurs because we don't log the actual key value for
unchanged toast key. It is neither logged as part of old_key_tuple nor
for new tuple due to which we are not able to find it in the
apply-side when we searched it via FindReplTupleInLocalRel. Now, I
think it will work if we LOG the entire unchanged toasted value as you
have done in the patch but can we explore some other way to fix it. In
the subscriber-side, can we detect that the key column has toasted
value in the new tuple and try to first fetch it and then do the index
search for the fetched toasted value? I am not sure about the
feasibility of this but wanted to see if we can someway avoid logging
unchanged toasted key value as that might save us from additional WAL.

-- 
With Regards,
Amit Kapila.




Re: postgresql.conf.sample missing units

2021-07-22 Thread Pavel Luzanov



On 21.07.2021 17:33, John Naylor wrote:
I pushed this and backpatched to v12. I also extracted just the 
"bytes" part and applied it to v11.

It's a little more complicated, but it's the right decision.
Thank you.

Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company





Re: alter table set TABLE ACCESS METHOD

2021-07-22 Thread Justin Pryzby
On Thu, Jul 22, 2021 at 10:37:12AM +0530, vignesh C wrote:
> On Fri, Jul 16, 2021 at 9:19 AM Justin Pryzby  wrote:
> >
> > rebased.
> >
> > Also, there were two redundant checks for multiple SET ACCESS METHOD 
> > commands.
> > But one of them wasn't hit if the ALTER was setting the current AM due to 
> > the
> > no-op test.
> >
> > I think it's better to fail in every case, and not just sometimes 
> > (especially
> > if we were to use ERRCODE_SYNTAX_ERROR).
> >
> > I included my 2ndary patch allowing to set the AM of partitioned table, 
> > same as
> > for a tablespace.
> 
> One of the tests is failing, please post an updated patch for this:
> create_am.out   2021-07-22 10:34:56.234654166 +0530

It looks like one hunk was missing/uncommitted from the 0002 patch..

-- 
Justin
>From 2e6748cadc56fad2a29a5cec895eb6796e890198 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 5 May 2021 14:28:59 -0700
Subject: [PATCH 1/2] ALTER TABLE ... SET ACCESS METHOD

Adds support for changing the access method of a table with a
rewrite.

Author: Justin Pryzby, Jeff Davis
---
 doc/src/sgml/ref/alter_table.sgml   | 20 
 src/backend/commands/cluster.c  | 21 +---
 src/backend/commands/matview.c  |  5 +-
 src/backend/commands/tablecmds.c| 68 +++--
 src/backend/parser/gram.y   |  8 +++
 src/bin/psql/tab-complete.c | 10 ++--
 src/include/commands/cluster.h  |  4 +-
 src/include/commands/event_trigger.h|  1 +
 src/include/nodes/parsenodes.h  |  1 +
 src/test/regress/expected/create_am.out | 34 +
 src/test/regress/sql/create_am.sql  | 21 
 11 files changed, 175 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 1c7f48938b..bf90040aa2 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -75,6 +75,7 @@ ALTER TABLE [ IF EXISTS ] name
 CLUSTER ON index_name
 SET WITHOUT CLUSTER
 SET WITHOUT OIDS
+SET ACCESS METHOD new_access_method
 SET TABLESPACE new_tablespace
 SET { LOGGED | UNLOGGED }
 SET ( storage_parameter [= value] [, ... ] )
@@ -692,6 +693,16 @@ WITH ( MODULUS numeric_literal, REM
 

 
+   
+SET ACCESS METHOD
+
+ 
+  This form changes the access method of the table by rewriting it. See
+   for more information.
+ 
+
+   
+

 SET TABLESPACE
 
@@ -1228,6 +1239,15 @@ WITH ( MODULUS numeric_literal, REM
   
  
 
+ 
+  new_access_method
+  
+   
+The name of the access method to which the table will be converted.
+   
+  
+ 
+
  
   new_tablespace
   
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 2912b4c257..45bf1276a5 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -641,6 +641,7 @@ static void
 rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Oid			accessMethod = OldHeap->rd_rel->relam;
 	Oid			tableSpace = OldHeap->rd_rel->reltablespace;
 	Oid			OIDNewHeap;
 	char		relpersistence;
@@ -661,6 +662,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 
 	/* Create the transient table that will receive the re-ordered data */
 	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   accessMethod,
 			   relpersistence,
 			   AccessExclusiveLock);
 
@@ -682,16 +684,16 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 /*
  * Create the transient table that will be filled with new data during
  * CLUSTER, ALTER TABLE, and similar operations.  The transient table
- * duplicates the logical structure of the OldHeap, but is placed in
- * NewTableSpace which might be different from OldHeap's.  Also, it's built
- * with the specified persistence, which might differ from the original's.
+ * duplicates the logical structure of the OldHeap; but will have the
+ * specified physical storage properties NewTableSpace, NewAccessMethod, and
+ * relpersistence.
  *
  * After this, the caller should load the new heap with transferred/modified
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
-			  LOCKMODE lockmode)
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
+			  char relpersistence, LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
 	char		NewHeapName[NAMEDATALEN];
@@ -750,7 +752,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 		  InvalidOid,
 		  InvalidOid,
 		  OldHeap->rd_rel->relowner,
-		  OldHeap->rd_rel->relam,
+		  NewAccessMethod,
 		  OldHeapDesc,
 		  NIL,
 		  RELKIND_RELATION,
@@ -1100,6 +1102,10 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		

Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Pavel Stehule
čt 22. 7. 2021 v 11:00 odesílatel Fabien COELHO 
napsal:

>
> Hello Pavel,
>
> >> The newly added PSQL_WATCH_PAGER feature which broke the patch does not
> >> seem to be tested anywhere, this is tiring:-(
> >
> > Do you have any idea how this can be tested?
>
> The TAP patch sent by Peter on this thread is a very good start.
>
> > It requires some pager that doesn't use blocking reading, and you need
> > to do remote control of this pager. So it requires a really especially
> > written pager just for this purpose. It is solvable, but I am not sure
> > if it is adequate to this patch.
>
> Not really: The point would not be to test the pager itself (that's for
> the people who develop the pager, not for psql), but just to test that the
> pager is actually started or not started by psql depending on conditions
> (eg pset pager…) and that it does *something* when started. See for
> instance the simplistic pager.pl script attached, the output of which
> could be tested. Note that PSQL_PAGER is not tested at all either.
> Basically "psql" is not tested, which is a pain when developing a non
> trivial patch.
>

Minimally for PSQL_WATCH_PAGER, the pager should exit after some time, but
before it has to repeat data reading. Elsewhere the psql will hang.

can be solution to use special mode for psql, when psql will do write to
logfile and redirect to file instead using any (simplified) pager?
Theoretically, there is nothing special on usage of pager, and just you can
test redirecting to file. That is not tested too. In this mode, you can
send sigint to psql - and it can be emulation of sigint to pager in
PSQL_WATCH_PAGER mode,




> --
> Fabien.


Re: Case expression pushdown

2021-07-22 Thread Alexander Pyhalov

Tom Lane писал 2021-07-21 19:49:

Gilles Darold  writes:

I'm attaching the v5 patch again as it doesn't appears in the Latest
attachment list in the commitfest.


So this has a few issues:


Hi.



1. In foreign_expr_walker, you're failing to recurse to either the
"arg" or "defresult" subtrees of a CaseExpr, so that it would fail
to notice unshippable constructs within those.


Fixed this.



2. You're also failing to guard against the hazard that a WHEN
expression within a CASE-with-arg has been expanded into something
that doesn't look like "CaseTestExpr = something".  As written,
this patch would likely dump core in that situation, and if it didn't
it would send nonsense to the remote server.  Take a look at the
check for that situation in ruleutils.c (starting at line 8764
as of HEAD) and adapt it to this.  Probably what you want is to
just deem the CASE un-pushable if it's been modified away from that
structure.  This is enough of a corner case that optimizing it
isn't worth a great deal of trouble ... but crashing is not ok.



I think I fixed this by copying check from ruleutils.c.



3. A potentially uncomfortable issue for the CASE-with-arg syntax
is that the specific equality operator being used appears nowhere
in the decompiled expression, thus raising the question of whether
the remote server will interpret it the same way we did.  Given
that we restrict the values-to-be-compared to be of shippable
types, maybe this is safe in practice, but I have a bad feeling
about it.  I wonder if we'd be better off just refusing to ship
CASE-with-arg at all, which would a-fortiori avoid point 2.


I'm not shure how 'case a when b ...' is different from 'case when a=b 
...'

in this case. If type of a or b is not shippable, we will not push down
this expression in any way. And if they are of builtin types, why do
these expressions differ?



4. I'm not sure that I believe any part of the collation handling.
There is the question of what collations will be used for the
individual WHEN comparisons, which can probably be left for
the recursive checks of the CaseWhen.expr subtrees to handle;
and then there is the separate issue of whether the CASE's result
collation (which arises from the CaseWhen.result exprs plus the
CaseExpr.defresult expr) can be deemed to be safely derived from
remote Vars.  I haven't totally thought through how that should
work, but I'm pretty certain that handling the CaseWhen's within
separate recursive invocations of foreign_expr_walker cannot
possibly get it right.  However, you'll likely have to flatten
those anyway (i.e., handle them within the loop in the CaseExpr
case) while fixing point 2.


I've tried to account for a fact that we are interested only in
caseWhen->result collations, but still not sure that I'm right here.



5. This is a cosmetic point, but: the locations of the various
additions in deparse.c seem to have been chosen with the aid
of a dartboard.  We do have a convention for this sort of thing,
which is to lay out code concerned with different node types
in the same order that the node types are declared in *nodes.h.
I'm not sufficiently anal to want to fix the existing violations
of that rule that I see in deparse.c; but the fact that somebody
got this wrong before isn't license to make things worse.

regards, tom lane


Fixed this.

Thanks for review.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f323ce6e6e12004ee448bbd6721c396826bd9eeb Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Thu, 22 Jul 2021 11:42:16 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 136 +
 .../postgres_fdw/expected/postgres_fdw.out| 184 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  63 ++
 3 files changed, 383 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..f0cbd958890 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -186,6 +186,8 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -627,6 +629,91 @@ foreign_expr_walker(Node *node,
 state = FDW_COLLATE_NONE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+ListCell   *lc;
+foreign_loc_cxt tmp_cxt;
+CaseExpr   *ce = (CaseExpr *) node;
+
+/*
+ * case arg expression collation doesn't affect result
+ * collation
+ */
+tmp_cxt.collation = InvalidOid;
+tmp_cxt.state = FDW_COLLATE_NONE;
+if (ce->arg && !foreign_expr_walker((Node *) ce->arg, glob_cxt, _cxt))
+	return false;
+
+/* Recurse to case clause subexpressions. */
+foreach(lc, ce->args)
+{
+	CaseWhen   *cw = 

Re: BUG #15293: Stored Procedure Triggered by Logical Replication is Unable to use Notification Events

2021-07-22 Thread Artur Zakirov
Hello hackers,

On Wed, Jul 14, 2021 at 11:30 AM Sergey Fedchenko  wrote:
>
> Hi all! I still can reproduce it on 14beta1 version. I adapted a patch I 
> found in this thread 
> https://github.com/seregayoga/postgres/commit/338bc33f2cf77edde7c45bfdfb9f39a92ec57eb8
>  . It solved this bug for me (tested with simple Go program using 
> https://github.com/lib/pq). It would be nice to have this bug fixed. I’m not 
> so familiar with postgres code base, but would glad to help with testing.

In our company in one of our projects we ran into this bug too.

I attached the patch which fixes it in a different way. It calls
SignalBackends() in AtCommit_Notify(). It is possible to call SignalBackends()
outside of ProcessCompletedNotifies() after the commit
51004c7172b5c35afac050f4d5849839a06e8d3b, which removes necessity of checking
of SignalBackends()'s result.

Moving SignalBackends() to AtCommit_Notify() makes it possible to signal
backends by "non-normal" backends including logical replication workers. It
seems it is safe to call SignalBackends() in AtCommit_Notify() since
SignalBackends() doesn't raise any error except OOM.

Regarding Andres concern:
> what I am wondering is what happens if there's a background worker (like
> the replication worker, but it could be other things too) that queues
> notifications, but no normal backends are actually listening. As far as
> I can tell, in that case we'd continue to queue stuff into the slru, but
> wouldn't actually clean things up until a normal session gets around to
> it? Which might be a while, on e.g. a logical replica.

A backend which raises notifications calls asyncQueueAdvanceTail() sometimes,
which truncates pages up to new tail, which is equal to head if there are no
listening backends. But there will be a problem if there is a backend which
is listening but it doesn't process incoming notifications and doesn't update
its queue position. In that case asyncQueueAdvanceTail() is able to advance
tail only up to that backend's queue position.

--
Artur Zakirov
PostgreSQL Developer at Adjust
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4b16fb5682..9093af42bd 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -69,16 +69,18 @@
  *
  *	  After commit we are called another time (AtCommit_Notify()). Here we
  *	  make the actual updates to the effective listen state (listenChannels).
+ *	  Also we check if we need to signal listening backends.  In
+ *	  SignalBackends() we scan the list of listening backends and send a
+ *	  PROCSIG_NOTIFY_INTERRUPT signal to every listening backend (we don't know
+ *	  which backend is listening on which channel so we must signal them all).
+ *	  We can exclude backends that are already up to date, though, and we can
+ *	  also exclude backends that are in other databases (unless they are way
+ *	  behind and should be kicked to make them advance their pointers).
+ *	  We don't bother with a self-signal either.
  *
  *	  Finally, after we are out of the transaction altogether, we check if
- *	  we need to signal listening backends.  In SignalBackends() we scan the
- *	  list of listening backends and send a PROCSIG_NOTIFY_INTERRUPT signal
- *	  to every listening backend (we don't know which backend is listening on
- *	  which channel so we must signal them all). We can exclude backends that
- *	  are already up to date, though, and we can also exclude backends that
- *	  are in other databases (unless they are way behind and should be kicked
- *	  to make them advance their pointers).  We don't bother with a
- *	  self-signal either, but just process the queue directly.
+ *	  we need to send out self-notifies.  In ProcessCompletedNotifies() we just
+ *	  process the queue directly.
  *
  * 5. Upon receipt of a PROCSIG_NOTIFY_INTERRUPT signal, the signal handler
  *	  sets the process's latch, which triggers the event to be processed
@@ -984,7 +986,9 @@ PreCommit_Notify(void)
  *
  *		This is called at transaction commit, after committing to clog.
  *
- *		Update listenChannels and clear transaction-local state.
+ *		Update listenChannels and clear transaction-local state.  If we issued
+ *		any notifications in the transaction, send signals to other backends to
+ *		process them.
  */
 void
 AtCommit_Notify(void)
@@ -1027,6 +1031,13 @@ AtCommit_Notify(void)
 	if (amRegisteredListener && listenChannels == NIL)
 		asyncQueueUnregister();
 
+	/*
+	 * Send signals to other backends. We call SignalBackends() only if there
+	 * are pending notifies which are added to the queue by PreCommit_Notify().
+	 */
+	if (pendingNotifies != NULL)
+		SignalBackends();
+
 	/* And clean up */
 	ClearPendingActionsAndNotifies();
 }
@@ -1200,14 +1211,13 @@ Exec_UnlistenAllCommit(void)
 }
 
 /*
- * ProcessCompletedNotifies --- send out signals and self-notifies
+ * ProcessCompletedNotifies --- send out self-notifies
  *
  * This is called from postgres.c just before going idle at the 

Re: psql - add SHOW_ALL_RESULTS option

2021-07-22 Thread Fabien COELHO


Hello Pavel,


The newly added PSQL_WATCH_PAGER feature which broke the patch does not
seem to be tested anywhere, this is tiring:-(


Do you have any idea how this can be tested?


The TAP patch sent by Peter on this thread is a very good start.

It requires some pager that doesn't use blocking reading, and you need 
to do remote control of this pager. So it requires a really especially 
written pager just for this purpose. It is solvable, but I am not sure 
if it is adequate to this patch.


Not really: The point would not be to test the pager itself (that's for 
the people who develop the pager, not for psql), but just to test that the 
pager is actually started or not started by psql depending on conditions 
(eg pset pager…) and that it does *something* when started. See for 
instance the simplistic pager.pl script attached, the output of which 
could be tested. Note that PSQL_PAGER is not tested at all either. 
Basically "psql" is not tested, which is a pain when developing a non 
trivial patch.


--
Fabien.#! /usr/bin/perl -wn
print "$.\t$_";
exit(0) if $. == 30;


Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-07-22 Thread Ranier Vilela
Em qui., 22 de jul. de 2021 às 04:00, Ronan Dunklau 
escreveu:

> Le jeudi 22 juillet 2021, 02:16:52 CEST Ranier Vilela a écrit :
> > Unfortunately your patch does not apply clear into the head.
> > So I have a few suggestions on v2, attached with the .txt extension to
> > avoid cf bot.
> > Please, if ok, make the v3.
>
> Hum weird, it applied cleanly for me, and was formatted using git show
> which I
> admit is not ideal. Please find it reattached.
>
ranier@notebook2:/usr/src/postgres$ git apply <
v2_fix_postgresfdw_orderby_handling.patch
error: falha no patch: contrib/postgres_fdw/deparse.c:37
error: contrib/postgres_fdw/deparse.c: patch does not apply
error: falha no patch: contrib/postgres_fdw/expected/postgres_fdw.out:3168
error: contrib/postgres_fdw/expected/postgres_fdw.out: patch does not apply
error: falha no patch: contrib/postgres_fdw/postgres_fdw.c:916
error: contrib/postgres_fdw/postgres_fdw.c: patch does not apply
error: falha no patch: contrib/postgres_fdw/postgres_fdw.h:165
error: contrib/postgres_fdw/postgres_fdw.h: patch does not apply
error: falha no patch: contrib/postgres_fdw/sql/postgres_fdw.sql:873
error: contrib/postgres_fdw/sql/postgres_fdw.sql: patch does not apply
error: falha no patch: src/backend/optimizer/path/equivclass.c:932
error: src/backend/optimizer/path/equivclass.c: patch does not apply
error: falha no patch: src/include/optimizer/paths.h:144
error: src/include/optimizer/paths.h: patch does not apply


>
>
> >
> > 2. appendOrderbyUsingClause function
> > Put the buffer actions together?
> >
> Not sure what you mean here ?
>
+ appendStringInfoString(buf, " USING ");
+ deparseOperatorName(buf, operform);


>
> > 3. Apply style Postgres?
> > + if (!HeapTupleIsValid(tuple))
> > + {
> > + elog(ERROR, "cache lookup failed for operator family %u",
> > pathkey->pk_opfamily);
> > + }
> >
>
> Good catch !
>
>
> > 4. Assertion not ok here?
> > + em = find_em_for_rel(pathkey->pk_eclass, baserel);
> > + em_expr = em->em_expr;
> >   Assert(em_expr != NULL);
> >
>
> If we are here there should never be a case where the em can't be found. I
> moved the assertion where it makes sense though.
>
> Your version of function is_foreign_pathkey (v4),
not reduce scope the variable PgFdwRelationInfo *fpinfo.
I still prefer the v3 version.

The C ternary operator ? : ;
It's nothing more than a disguised if else

regards,
Ranier Vilela


Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-07-22 Thread Ronan Dunklau
Le jeudi 22 juillet 2021, 09:44:54 CEST David Rowley a écrit :
> +-- This will not be pushed either
> +explain verbose select * from ft2 order by c1 using operator(public.<^);
> +  QUERY PLAN
> +---
>  + Sort  (cost=190.83..193.33 rows=1000 width=142)
> 
> 
> Can you also use explain (verbose, costs off) the same as the other
> tests in that area.  Having the costs there would never survive a run
> of the buildfarm. Different hardware will produce different costs, e.g
> 32-bit hardware might cost cheaper due to narrower widths.
> 

Sorry about that. Here it is. 


Regards,

-- 
Ronan Dunklau>From 89217c39631440198111be931634ed2f227e08e0 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Wed, 21 Jul 2021 12:44:41 +0200
Subject: [PATCH v5] Fix postgres_fdw PathKey's handling.

The operator family being used for the sort was completely
ignored, and as such its existence on the foreign server was not
checked.
---
 contrib/postgres_fdw/deparse.c| 128 +-
 .../postgres_fdw/expected/postgres_fdw.out|  21 +++
 contrib/postgres_fdw/postgres_fdw.c   |  21 +--
 contrib/postgres_fdw/postgres_fdw.h   |   9 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |   6 +
 src/backend/optimizer/path/equivclass.c   |  19 ++-
 src/include/optimizer/paths.h |   1 +
 7 files changed, 154 insertions(+), 51 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..8e9fc512a1 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -37,9 +37,11 @@
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "catalog/pg_aggregate.h"
+#include "catalog/pg_amop.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
@@ -180,6 +182,7 @@ static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   Index ignore_rel, List **ignore_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
+static void appendOrderbyUsingClause(Oid sortop, Oid sortcoltype, deparse_expr_cxt *context);
 static void appendAggOrderBy(List *orderList, List *targetList,
 			 deparse_expr_cxt *context);
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
@@ -910,6 +913,37 @@ is_foreign_param(PlannerInfo *root,
 	return false;
 }
 
+/* Returns true if the given pathkey can be evaluated on the remote side
+ */
+bool
+is_foreign_pathkey(PlannerInfo *root,
+   RelOptInfo *baserel,
+   PathKey *pathkey)
+{
+	EquivalenceClass *pathkey_ec = pathkey->pk_eclass;
+	EquivalenceMember *em;
+	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel->fdw_private;
+
+	/*
+	 * is_foreign_expr would detect volatile expressions as well, but
+	 * checking ec_has_volatile here saves some cycles.
+	 */
+	if (pathkey_ec->ec_has_volatile)
+		return false;
+
+	em = find_em_for_rel(pathkey_ec, baserel);
+	if (em == NULL)
+		return false;
+
+	/*
+	 * Finally, verify the found member's expression is foreign and its operator
+	 * family is shippable.
+	 */
+	return (is_foreign_expr(root, baserel, em->em_expr) &&
+			is_shippable(pathkey->pk_opfamily, OperatorFamilyRelationId, fpinfo));
+}
+
+
 /*
  * Convert type OID + typmod info into a type name we can ship to the remote
  * server.  Someplace else had better have verified that this type name is
@@ -3125,6 +3159,37 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context)
 	appendStringInfoChar(buf, ')');
 }
 
+/*
+ * Append the USING  part of an ORDER BY clause
+ */
+static void
+appendOrderbyUsingClause(Oid sortop, Oid sortcoltype, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	TypeCacheEntry *typentry;
+	typentry = lookup_type_cache(sortcoltype, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+
+	if (sortop == typentry->lt_opr)
+		appendStringInfoString(buf, " ASC");
+	else if (sortop == typentry->gt_opr)
+		appendStringInfoString(buf, " DESC");
+	else
+	{
+		HeapTuple	opertup;
+		Form_pg_operator operform;
+
+		appendStringInfoString(buf, " USING ");
+
+		/* Append operator name. */
+		opertup = SearchSysCache1(OPEROID, ObjectIdGetDatum(sortop));
+		if (!HeapTupleIsValid(opertup))
+			elog(ERROR, "cache lookup failed for operator %u", sortop);
+		operform = (Form_pg_operator) GETSTRUCT(opertup);
+		deparseOperatorName(buf, operform);
+		ReleaseSysCache(opertup);
+	}
+}
+
 /*
  * Append ORDER BY within aggregate function.
  */
@@ -3140,7 +3205,6 @@ appendAggOrderBy(List *orderList, List *targetList, deparse_expr_cxt *context)
 		SortGroupClause *srt = (SortGroupClause *) lfirst(lc);
 		Node	   *sortexpr;
 		Oid			sortcoltype;
-		TypeCacheEntry 

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2021-07-22 Thread Ronan Dunklau
Le jeudi 22 juillet 2021, 09:38:50 CEST David Rowley a écrit :
> On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau  wrote:
> > I tested the 0001 patch against both HEAD and my proposed bugfix for
> > postgres_fdw.
> > 
> > There is a problem that the ordered aggregate is not pushed down anymore.
> > The underlying Sort node is correctly pushed down though.
> > 
> > This comes from the fact that postgres_fdw grouping path never contains
> > any
> > pathkey. Since the cost is fuzzily the same between the pushed-down
> > aggregate and the locally performed one, the tie is broken against the
> > pathkeys.
> I think this might be more down to a lack of any penalty cost for
> fetching foreign tuples. Looking at create_foreignscan_path(), I don't
> see anything that adds anything to account for fetching the tuples
> from the foreign server. If there was something like that then there'd
> be more of a preference to perform the remote aggregation so that
> fewer rows must arrive from the remote server.
> 
> I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in
> create_foreignscan_path() and I got the plan with the remote
> aggregation. That's a fairly large penalty of 1.0 per row. Much bigger
> than parallel_tuple_cost's default value.
> 
> I'm a bit undecided on how much this patch needs to get involved in
> adjusting foreign scan costs.  The problem is that we've given the
> executor a new path to consider and nobody has done any proper
> costings for the foreign scan so that it properly prefers paths that
> have to pull fewer foreign tuples.  This is a pretty similar problem
> to what parallel_tuple_cost aims to fix. Also similar to how we had to
> add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates
> prefer grouping at the partition level rather than at the partitioned
> table level.
> 
> > Ideally we would add the group pathkeys to the grouping path, but this
> > would add an additional ORDER BY expression matching the GROUP BY.
> > Moreover, some triaging of the pathkeys would be necessary since we now
> > mix the sort-in- aggref pathkeys with the group_pathkeys.
> 
> I think you're talking about passing pathkeys into
> create_foreign_upper_path in add_foreign_grouping_paths. If so, I
> don't really see how it would be safe to add pathkeys to the foreign
> grouping path. What if the foreign server did a Hash Aggregate?  The
> rows might come back in any random order.

Yes, I was suggesting to add a new path with the pathkeys factored in, which 
if chosen over the non-ordered path would result in an additional ORDER BY 
clause to prevent a HashAggregate. But that doesn't seem a good idea after 
all.

> 
> I kinda think that to fix this properly would need a new foreign
> server option such as foreign_tuple_cost. I'd feel better about
> something like that with some of the people with a vested interest in
> the FDW code were watching more closely. So far we've not managed to
> entice any of them with the bug report yet, but it's maybe early days
> yet.

We already have that in the form of fdw_tuple_cost as a server option if I'm 
not mistaken ? It works as expected when the number of tuples is notably 
reduced by the foreign group by.

The problem arise when the cardinality of the groups is equal to the input's 
cardinality. I think even in that case we should try to use a remote aggregate 
since it's a computation that will not happen on the local server. I also 
think we're more likely to have up to date statistics remotely than the ones 
collected locally on the foreign tables, and the estimated number of groups 
would be more accurate on the remote side than the local one.

-- 
Ronan Dunklau






Re: Kerberos delegation support in libpq and postgres_fdw

2021-07-22 Thread Peifeng Qiu
Hi all.

I've slightly modified the patch to support "gssencmode" and added TAP tests.

Best regards,
Peifeng Qiu


From: Peifeng Qiu
Sent: Tuesday, July 20, 2021 11:05 AM
To: pgsql-hackers@lists.postgresql.org ; 
Magnus Hagander ; Stephen Frost ; Tom 
Lane 
Subject: Kerberos delegation support in libpq and postgres_fdw

Hi hackers.

This is the patch to add kerberos delegation support in libpq, which
enables postgres_fdw to connect to another server and authenticate
as the same user to the current login user. This will obsolete my
previous patch which requires keytab file to be present on the fdw
server host.

After the backend accepts the gssapi context, it may also get a
proxy credential if permitted by policy. I previously made a hack
to pass the pointer of proxy credential directly into libpq. It turns
out that the correct way to do this is store/acquire using credential
cache within local process memory to prevent leak.

Because no password is needed when querying foreign table via
kerberos delegation, the "password_required" option in user
mapping must be set to false by a superuser. Other than this, it
should work with normal user.

I only tested it manually in a very simple configuration currently.
I will go on to work with TAP tests for this.

How do you feel about this patch? Any feature/security concerns
about this?

Best regards,
Peifeng Qiu

commit a413b020afb663b5f89722b480c1b453e4304a36
Author: Peifeng Qiu 
Date:   Thu Jul 22 17:27:21 2021 +0900

kerberos delegation

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 4593cbc540..855f9af09c 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -247,6 +247,9 @@ InitPgFdwOptions(void)
 		{"sslcert", UserMappingRelationId, true},
 		{"sslkey", UserMappingRelationId, true},
 
+		/* gssencmode is also libpq option, same to above. */
+		{"gssencmode", UserMappingRelationId, true},
+
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 8cc23ef7fb..235c9b32f5 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -918,6 +918,7 @@ pg_GSS_recvauth(Port *port)
 	int			mtype;
 	StringInfoData buf;
 	gss_buffer_desc gbuf;
+	gss_cred_id_t proxy;
 
 	/*
 	 * Use the configured keytab, if there is one.  Unfortunately, Heimdal
@@ -947,6 +948,8 @@ pg_GSS_recvauth(Port *port)
 	 */
 	port->gss->ctx = GSS_C_NO_CONTEXT;
 
+	proxy = NULL;
+
 	/*
 	 * Loop through GSSAPI message exchange. This exchange can consist of
 	 * multiple messages sent in both directions. First message is always from
@@ -997,7 +1000,7 @@ pg_GSS_recvauth(Port *port)
 		  >gss->outbuf,
 		  ,
 		  NULL,
-		  NULL);
+		  );
 
 		/* gbuf no longer used */
 		pfree(buf.data);
@@ -1009,6 +1012,9 @@ pg_GSS_recvauth(Port *port)
 
 		CHECK_FOR_INTERRUPTS();
 
+		if (proxy != NULL)
+			pg_store_proxy_credential(proxy);
+
 		if (port->gss->outbuf.length != 0)
 		{
 			/*
diff --git a/src/backend/libpq/be-gssapi-common.c b/src/backend/libpq/be-gssapi-common.c
index 38f58def25..cd243994c8 100644
--- a/src/backend/libpq/be-gssapi-common.c
+++ b/src/backend/libpq/be-gssapi-common.c
@@ -92,3 +92,40 @@ pg_GSS_error(const char *errmsg,
 			(errmsg_internal("%s", errmsg),
 			 errdetail_internal("%s: %s", msg_major, msg_minor)));
 }
+
+void
+pg_store_proxy_credential(gss_cred_id_t cred)
+{
+	OM_uint32 major, minor;
+	gss_OID_set mech;
+	gss_cred_usage_t usage;
+	gss_key_value_element_desc cc;
+	gss_key_value_set_desc ccset;
+
+	cc.key = "ccache";
+	cc.value = "MEMORY:";
+	ccset.count = 1;
+	ccset.elements = 
+
+	/* Make the proxy credential only available to current process */
+	major = gss_store_cred_into(,
+		cred,
+		GSS_C_INITIATE, /* credential only used for starting libpq connection */
+		GSS_C_NULL_OID, /* store all */
+		true, /* overwrite */
+		true, /* make default */
+		,
+		,
+		);
+
+
+	if (major != GSS_S_COMPLETE)
+	{
+		pg_GSS_error("gss_store_cred", major, minor);
+	}
+
+	/* quite strange that gss_store_cred doesn't work with "KRB5CCNAME=MEMORY:",
+	 * we have to use gss_store_cred_into instead and set the env for later
+	 * gss_acquire_cred calls. */
+	putenv("KRB5CCNAME=MEMORY:");
+}
diff --git a/src/backend/libpq/be-secure-gssapi.c b/src/backend/libpq/be-secure-gssapi.c
index 316ca65db5..e27d517dea 100644
--- a/src/backend/libpq/be-secure-gssapi.c
+++ b/src/backend/libpq/be-secure-gssapi.c
@@ -497,6 +497,7 @@ secure_open_gssapi(Port *port)
 	bool		complete_next = false;
 	OM_uint32	major,
 minor;
+	gss_cred_id_t	proxy;
 
 	/*
 	 * Allocate subsidiary Port data for GSSAPI operations.
@@ -588,7 +589,8 @@ secure_open_gssapi(Port *port)
 	   GSS_C_NO_CREDENTIAL, ,
 	   GSS_C_NO_CHANNEL_BINDINGS,
 	   >gss->name, NULL, , NULL,
-	   NULL, NULL);
+	   NULL, );
+
 		if (GSS_ERROR(major))
 		{
 			pg_GSS_error(_("could not accept GSSAPI security 

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2021-07-22 Thread Nikolay Shaplov
В письме от среда, 14 июля 2021 г. 15:09:12 MSK пользователь vignesh C 
написал:
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Thank you for notification. 

I've tried to rebase it and found out that some options have been added to 
partitioned table.
Handling this needs careful approach, and I will fix it two weeks later, when I 
am back from vacations.


Meanwhile I would strongly suggest to change

{"vacuum_index_cleanup", RELOPT_TYPE_BOOL,

to 

{"vacuum_index_cleanup", RELOPT_TYPE_ENUM,

in src/backend/access/common/reloptions.c

This change should be done in 3499df0d
But current implementation of reloptions is very error prone , and it is very 
easy to miss this part.



-- 
Nikolay Shaplov 
dh...@nataraj.su (e-mail, jabber)  
@dhyan:nataraj.su (matrix)






Re: .ready and .done files considered harmful

2021-07-22 Thread Jeevan Ladhe
Thanks, Dipesh. The patch LGTM.

Some minor suggestions:

+ *

+ * "nextLogSegNo" identifies the next log file to be archived in a log

+ * sequence and the flag "dirScan" specifies a full directory scan to find

+ * the next log file.


IMHO, this comment should go atop of pgarch_readyXlog() as a description

of its parameters, and not in pgarch_ArchiverCopyLoop().


 /*

+ * Interrupt handler for archiver

+ *

+ * There is a timeline switch and we have been notified by backend.

+ */


Instead, I would suggest having something like this:


+/*

+ * Interrupt handler for handling the timeline switch.

+ *

+ * A timeline switch has been notified, mark this event so that the next
iteration

+ * of pgarch_ArchiverCopyLoop() archives the history file, and we set the

+ * timeline to the new one for the next anticipated log segment.

+ */


Regards,

Jeevan Ladhe

On Thu, Jul 22, 2021 at 12:46 PM Dipesh Pandit 
wrote:

> Hi,
>
> > some comments on v2.
> Thanks for your comments. I have incorporated the changes
> and updated a new patch. Please find the details below.
>
> > On the timeline switch, setting a flag should be enough, I don't think
> > that we need to wake up the archiver.  Because it will just waste the
> > scan cycle.
> Yes, I modified it.
>
> > Why do we need multi level interfaces? I mean instead of calling first
> > XLogArchiveNotifyTLISwitch and then calling PgArchNotifyTLISwitch,
> > can't we directly call PgArchNotifyTLISwitch()?
> Yes, multilevel interfaces are not required. Removed extra interface.
>
> > +if (timeline_switch)
> > +{
> > +/* Perform a full directory scan in next cycle */
> > +dirScan = true;
> > +timeline_switch = false;
> > +}
>
> > I suggest you can add some comments atop this check.
> Added comment to specify the action required in case of a
> timeline switch.
>
> > I think you should use %m in the error message so that it also prints
> > the OS error code.
> Done.
>
> > Why is this a global variable?  I mean whenever you enter the function
> > pgarch_ArchiverCopyLoop(), this can be set to true and after that you
> > can pass this as inout parameter to pgarch_readyXlog() there in it can
> > be conditionally set to false once we get some segment and whenever
> > the timeline switch we can set it back to the true.
> Yes, It is not necessary to have global scope for "dirScan". Changed
> the scope to local for "dirScan" and "nextLogSegNo".
>
> PFA patch v3.
>
> Thanks,
> Dipesh
>


Re: logical replication empty transactions

2021-07-22 Thread Peter Smith
Hi Ajin.

I have reviewed the v8 patch and my feedback comments are below:

//

1. Apply v8 gave multiple whitespace warnings.

--

2. Commit comment - wording

If (when processing a COMMIT / PREPARE message) we find there had been
no other change for that transaction, then do not send the COMMIT /
PREPARE message. This means that pgoutput will skip BEGIN / COMMIT
or BEGIN PREPARE / PREPARE  messages for transactions that are empty.

=>

Shouldn't this also mention some other messages that may be skipped?
- COMMIT PREPARED
- ROLLBACK PREPARED

--

3. doc/src/sgml/logicaldecoding.sgml - wording

@@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct
LogicalDecodingContext *ctx,
   The required commit_prepared_cb callback is called
   whenever a transaction COMMIT PREPARED has
been decoded.
   The gid field, which is part of the
-  txn parameter, can be used in this callback.
+  txn parameter, can be used in this callback. The
+  parameters prepare_end_lsn and
+  prepare_time can be used to check if the plugin
+  has received this PREPARE TRANSACTION command or not.
+  If yes, it can commit the transaction, otherwise, it can skip the commit.
+  The gid alone is not sufficient to determine this
+  because the downstream may already have a prepared transaction with the
+  same identifier.

=>

Typo: Should that say "downstream node" instead of just "downstream" ?

--

4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn
callback comment

@@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,

 /*
  * BEGIN callback
+ * Don't send BEGIN message here. Instead, postpone it until the first
+ * change. In logical replication, a common scenario is to replicate a set
+ * of tables (instead of all tables) and transactions whose changes were on

=>

Typo: "BEGIN callback" --> "BEGIN callback." (with the period).

And, I think maybe it will be better if it has a separating blank line too.

e.g.

/*
 * BEGIN callback.
 *
 * Don't send BEGIN 

(NOTE: this review comment applies to other callback function comments
too, so please hunt them all down)

--

5. src/backend/replication/pgoutput/pgoutput.c - data / txndata

 static void
 pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
 {
+ PGOutputTxnData*data = MemoryContextAllocZero(ctx->context,
+ sizeof(PGOutputTxnData));

=>

There is some inconsistent naming of the local variable in the patch.
Sometimes it is called "data"; Sometimes it is called "txdata" etc. It
would be better to just stick with the same variable name everywhere.

(NOTE: this comment applies to several places in this patch)

--

6. src/backend/replication/pgoutput/pgoutput.c - Strange way to use Assert

+ /* If not streaming, should have setup txndata as part of
BEGIN/BEGIN PREPARE */
+ if (!in_streaming)
+ Assert(txndata);
+

=>

This style of Assert code seemed strange to me. In production mode
isn't that going to evaluate to some condition with a ((void) true)
body? IMO it might be better to just include the streaming check as
part of the Assert. For example:

BEFORE
if (!in_streaming)
Assert(txndata);

AFTER
Assert(in_streaming || txndata);

(NOTE: This same review comment applies in at least 3 places in this
patch, so please hunt them all down)

--

7. src/backend/replication/pgoutput/pgoutput.c - comment wording

@@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
  Assert(false);
  }

+ /*
+ * output BEGIN / BEGIN PREPARE if we haven't yet,
+ * while streaming no need to send BEGIN / BEGIN PREPARE.
+ */
+ if (!in_streaming && !txndata->sent_begin_txn)

=>

English not really that comment is. The comment should also start with
uppercase.

(NOTE: This same comment was in couple of places in the patch)

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: A (but copied many) typo of char-mapping tables

2021-07-22 Thread Peter Eisentraut

On 16.07.21 10:02, Kyotaro Horiguchi wrote:

While I looked into a .map file in src/backend/utils/mb/Unicode, I
notice of a typo in it.

  > static const pg_mb_radix_tree euc_jp_from_unicode_tree =
  > {
  > ..
  >   0x, /* offset of table for 1-byte inputs */
  > ...
  >   0x0040, /* offset of table for 2-byte inputs */
  > ...
  >   0x02c3, /* offset of table for 3-byte inputs */
  > ...
!>   0x, /* offset of table for 3-byte inputs */
  >   0x00, /* b4_1_lower */
  >   0x00, /* b4_1_upper */
  > ...
  > };

Yeah, the line above prefixed by '!' is apparently a typo of "4-byte
inputs", which comes from a typo in convutils.pm.


fixed, thanks




Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-07-22 Thread David Rowley
On Thu, 22 Jul 2021 at 19:00, Ronan Dunklau  wrote:
> Please find it reattached.

+-- This will not be pushed either
+explain verbose select * from ft2 order by c1 using operator(public.<^);
+  QUERY PLAN
+---
+ Sort  (cost=190.83..193.33 rows=1000 width=142)


Can you also use explain (verbose, costs off) the same as the other
tests in that area.  Having the costs there would never survive a run
of the buildfarm. Different hardware will produce different costs, e.g
32-bit hardware might cost cheaper due to narrower widths.

History lesson: costs off was added so we could test plans. Before
that, I don't think that the regression tests had any coverage for
plans.  Older test files still likely lack much testing with EXPLAIN.

David




Re: postgres_fdw: Handle boolean comparison predicates

2021-07-22 Thread Ronan Dunklau
Le lundi 31 mai 2021, 18:51:57 CEST Emre Hasegeli a écrit :
> > Please add this patch to the commitfest so that it's not forgotten. It
> > will be considered as a new feature so will be considered for commit
> > after the next commitfest.
> 
> I did [1].  You can add yourself as a reviewer.
> 
> > I don't understand why we need to complicate the expressions when
> > sending those to the foreign nodes. Why do we want to send
> > (NOT xyz OR xyz IS NULL) and not as just (xyz IS FALSE).
> > The latter is much more readable and less error-prone. That true for
> > all the BooleanTest deparsing.
> 
> = true/false conditions are normalised.  I thought similar behaviour
> would be expected here.

I agree with Ashutosh, since IS NOT TRUE / FALSE is already a way of 
normalizing it I don't really see what this brings.

> 
> > +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 = 100) IS
> > TRUE;-- BooleanTest
> > 
> > Also test a boolean column?
> 
> There isn't a boolean column on the test table currently.

We should probably add one then. 



-- 
Ronan Dunklau






Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2021-07-22 Thread David Rowley
On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau  wrote:
> I tested the 0001 patch against both HEAD and my proposed bugfix for
> postgres_fdw.
>
> There is a problem that the ordered aggregate is not pushed down anymore. The
> underlying Sort node is correctly pushed down though.
>
> This comes from the fact that postgres_fdw grouping path never contains any
> pathkey. Since the cost is fuzzily the same between the pushed-down aggregate
> and the locally performed one, the tie is broken against the pathkeys.

I think this might be more down to a lack of any penalty cost for
fetching foreign tuples. Looking at create_foreignscan_path(), I don't
see anything that adds anything to account for fetching the tuples
from the foreign server. If there was something like that then there'd
be more of a preference to perform the remote aggregation so that
fewer rows must arrive from the remote server.

I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in
create_foreignscan_path() and I got the plan with the remote
aggregation. That's a fairly large penalty of 1.0 per row. Much bigger
than parallel_tuple_cost's default value.

I'm a bit undecided on how much this patch needs to get involved in
adjusting foreign scan costs.  The problem is that we've given the
executor a new path to consider and nobody has done any proper
costings for the foreign scan so that it properly prefers paths that
have to pull fewer foreign tuples.  This is a pretty similar problem
to what parallel_tuple_cost aims to fix. Also similar to how we had to
add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates
prefer grouping at the partition level rather than at the partitioned
table level.

> Ideally we would add the group pathkeys to the grouping path, but this would
> add an additional ORDER BY expression matching the GROUP BY. Moreover, some
> triaging of the pathkeys would be necessary since we now mix the sort-in-
> aggref pathkeys with the group_pathkeys.

I think you're talking about passing pathkeys into
create_foreign_upper_path in add_foreign_grouping_paths. If so, I
don't really see how it would be safe to add pathkeys to the foreign
grouping path. What if the foreign server did a Hash Aggregate?  The
rows might come back in any random order.

I kinda think that to fix this properly would need a new foreign
server option such as foreign_tuple_cost. I'd feel better about
something like that with some of the people with a vested interest in
the FDW code were watching more closely. So far we've not managed to
entice any of them with the bug report yet, but it's maybe early days
yet.

David




  1   2   >