date:20210816

Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o

2021-08-16 Thread Amit Kapila

On Tue, Aug 17, 2021 at 10:54 AM Amit Kapila  wrote:
>
> On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar  wrote:
> >
> > On Fri, Aug 13, 2021 at 9:29 PM Andres Freund  wrote:
> >
> > > > I think we can extend this API but I guess it is better to then do it
> > > > for dsm-based as well so that these get tracked via resowner.
> > >
> > > DSM segments are resowner managed already, so it's not obvious that 
> > > that'd buy
> > > us much? Although I guess there could be a few edge cases that'd look 
> > > cleaner,
> > > because we could reliably trigger cleanup in the leader instead of 
> > > relying on
> > > dsm detach hooks + refcounts to manage when a set is physically deleted?
> > >
> >
> > In an off-list discussion with Thomas and Amit, we tried to discuss
> > how to clean up the shared files set in the current use case.  Thomas
> > suggested that instead of using individual shared fileset for storing
> > the data for each XID why don't we just create a single shared fileset
> > for complete worker lifetime and when the worker is exiting we can
> > just remove that shared fileset.  And for storing XID data, we can
> > just create the files under the same shared fileset and delete those
> > files when we longer need them.  I like this idea and it looks much
> > cleaner, after this, we can get rid of the special cleanup mechanism
> > using 'filesetlist'.  I have attached a patch for the same.
> >
>
> It seems to me that this idea would obviate any need for resource
> owners as we will have only one fileset now. I have a few initial
> comments on the patch:
>

One more comment:
@@ -2976,39 +2952,17 @@ subxact_info_write(Oid subid, TransactionId xid)
..
+ /* Try to open the subxact file, if it doesn't exist then create it */
+ fd = BufFileOpenShared(xidfileset, path, O_RDWR, true);
+ if (fd == NULL)
+ fd = BufFileCreateShared(xidfileset, path);
..

Instead of trying to create the file here based on whether it exists
or not, can't we create it in subxact_info_add where we are first time
allocating memory for subxacts? If that works then in the above code,
the file should always exist.

-- 
With Regards,
Amit Kapila.

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-16 Thread Michael Paquier

On Mon, Aug 16, 2021 at 12:06:16PM +0200, Michael Meskes wrote:
> You patch removes the warning but by doing that also removes the
> feature that is being tested.

Oops.  If kept this way, this test scenario is going to need a comment
to explain exactly that.

> I'm not sure what's the best way to go about it, Shall we accept to not
> test this particular feature and remove the warning? After all this is
> not the way the statement should be used, hence the warning. Or should
> be keep it in and redirect the warning? In that case, we would also
> lose other warnings that are not planned, though.

FWIW, I would tend to drop the warning here.  I am not sure that this
is a use case interesting enough.  My 2c.
--
Michael


signature.asc
Description: PGP signature

Re: Diagnostic comment in LogicalIncreaseXminForSlot

2021-08-16 Thread Ashutosh Bapat

Hi Amit and Andres,
Here's updated patch

On Mon, Aug 9, 2021 at 11:14 AM Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
> On Sat, Aug 7, 2021 at 11:40 AM Andres Freund  wrote:
>
>> Hi,
>>
>> On 2021-07-12 17:28:15 +0530, Ashutosh Bapat wrote:
>> > On Mon, Jul 12, 2021 at 8:39 AM Amit Kapila 
>> wrote:
>> >
>> > > On Mon, Jul 5, 2021 at 12:54 PM Masahiko Sawada <
>> sawada.m...@gmail.com>
>> > > Today, again looking at this patch, I don't think doing elog inside
>> > > spinlock is a good idea. We can either release spinlock before it or
>> > > use some variable to remember that we need to write such an elog and
>> > > do it after releasing the lock. What do you think?
>> >
>> >
>> > The elog will be effective only under DEBUG1, otherwise it will be
>> almost a
>> > NOOP. I am wondering whether it's worth adding a bool assignment and
>> move
>> > the elog out only for DEBUG1. Anyway, will defer it to you.
>>
>> It's *definitely* not ok to do an elog() while holding a spinlock.
>> Consider
>> what happens if the elog tries to format the message and runs out of
>> memory. Or if elog detects the stack depth is too deep. An ERROR would be
>> thrown, and we'd loose track of the held spinlock.
>>
>
> Thanks for the clarification.
>
> Amit,
> I will provide the patch changed accordingly.
>
> --
> Best Wishes,
> Ashutosh
>


-- 
--
Best Wishes,
Ashutosh
From 0a2f2c1f530d72219d9db7bd1ff77c1ae8c4ffe4 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 13 May 2021 13:40:33 +0530
Subject: [PATCH] Report new catalog_xmin candidate in
 LogicalIncreaseXminForSlot()

Similar to LogicalIncreaseRestartDecodingForSlot() add a debug message
to LogicalIncreaseXminForSlot() reporting a new catalog_xmin candidate.
---
 src/backend/replication/logical/logical.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 64b8280c13..6c8c387e73 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1565,6 +1565,7 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 {
 	bool		updated_xmin = false;
 	ReplicationSlot *slot;
+	bool		got_new_xmin = false;
 
 	slot = MyReplicationSlot;
 
@@ -1602,12 +1603,22 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 	{
 		slot->candidate_catalog_xmin = xmin;
 		slot->candidate_xmin_lsn = current_lsn;
+
+		/*
+		 * Add a line about new xmin in server logs at an appropriate log level
+		 * after releasing the spinlock.
+		 */
+		got_new_xmin = true;
 	}
 	SpinLockRelease(&slot->mutex);
 
 	/* candidate already valid with the current flush position, apply */
 	if (updated_xmin)
 		LogicalConfirmReceivedLocation(slot->data.confirmed_flush);
+
+	if (got_new_xmin)
+		elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin,
+			 LSN_FORMAT_ARGS(current_lsn));
 }
 
 /*
-- 
2.25.1

Re: A problem in ExecModifyTable

2021-08-16 Thread David Rowley

On Tue, 17 Aug 2021 at 15:56, 李杰(慎追)  wrote:
> According to my understanding, the parent table of a partitioned table does 
> not store any tuples.
> Then why is "relkind = = RELKIND_PARTITIONED_TABLE" suddenly added here ?

We'll need some sort of ResultRelInfo in the case that all partitions
are pruned.  Using the one for the partitioned table is convenient.  I
believe that's required if there are any statement-level triggers on
the partitioned table or if there's a RETURNING clause.

> There is no comment on this point in the code.
> Can you answer my confusion? Be deeply grateful.

Yeah maybe. It's not exactly close by, but one in grouping_planner
mentions this:

/*
* We managed to exclude every child rel, so generate a
* dummy one-relation plan using info for the top target
* rel (even though that may not be a leaf target).
* Although it's clear that no data will be updated or
* deleted, we still need to have a ModifyTable node so
* that any statement triggers will be executed.  (This
* could be cleaner if we fixed nodeModifyTable.c to allow
* zero target relations, but that probably wouldn't be a
* net win.)
*/

David

Re: Skipping logical replication transactions on subscriber side

2021-08-16 Thread Amit Kapila

On Tue, Aug 17, 2021 at 10:46 AM Masahiko Sawada  wrote:
>
> On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow  wrote:
> >
> > On Mon, Aug 16, 2021 at 5:54 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > Here is another comment:
> > >
> > > +char *
> > > +logicalrep_message_type(LogicalRepMsgType action)
> > > +{
> > > ...
> > > +   case LOGICAL_REP_MSG_STREAM_END:
> > > +   return "STREAM END";
> > > ...
> > >
> > > I think most the existing code use "STREAM STOP" to describe the
> > > LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" 
> > > in
> > > function logicalrep_message_type() too ?
> > >
> >
> > +1
> > I think you're right, it should be "STREAM STOP" in that case.
>
> It's right that we use "STREAM STOP" rather than "STREAM END" in many
> places such as elog messages, a callback name, and source code
> comments. As far as I have found there are two places where we’re
> using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
> doc/src/sgml/protocol.sgml. Isn't it better to fix these
> inconsistencies in the first place? I think “STREAM STOP” would be
> more appropriate.
>

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

-- 
With Regards,
Amit Kapila.

Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o

2021-08-16 Thread Amit Kapila

On Mon, Aug 16, 2021 at 8:18 PM Dilip Kumar  wrote:
>
> On Fri, Aug 13, 2021 at 9:29 PM Andres Freund  wrote:
>
> > > I think we can extend this API but I guess it is better to then do it
> > > for dsm-based as well so that these get tracked via resowner.
> >
> > DSM segments are resowner managed already, so it's not obvious that that'd 
> > buy
> > us much? Although I guess there could be a few edge cases that'd look 
> > cleaner,
> > because we could reliably trigger cleanup in the leader instead of relying 
> > on
> > dsm detach hooks + refcounts to manage when a set is physically deleted?
> >
>
> In an off-list discussion with Thomas and Amit, we tried to discuss
> how to clean up the shared files set in the current use case.  Thomas
> suggested that instead of using individual shared fileset for storing
> the data for each XID why don't we just create a single shared fileset
> for complete worker lifetime and when the worker is exiting we can
> just remove that shared fileset.  And for storing XID data, we can
> just create the files under the same shared fileset and delete those
> files when we longer need them.  I like this idea and it looks much
> cleaner, after this, we can get rid of the special cleanup mechanism
> using 'filesetlist'.  I have attached a patch for the same.
>

It seems to me that this idea would obviate any need for resource
owners as we will have only one fileset now. I have a few initial
comments on the patch:

1.
+ /* do cleanup on worker exit (e.g. after DROP SUBSCRIPTION) */
+ on_shmem_exit(worker_cleanup, (Datum) 0);

It should be registered with before_shmem_exit() hook to allow sending
stats for file removal.

2. After these changes, the comments atop stream_open_file and
SharedFileSetInit need to be changed.

3. In function subxact_info_write(), we are computing subxact file
path twice which doesn't seem to be required.

4.
+ if (missing_ok)
+ return NULL;
  ereport(ERROR,
  (errcode_for_file_access(),
- errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
+ errmsg("could not open temporary file \"%s\" from BufFile \"%s\": %m",
  segment_name, name)));

There seems to be a formatting issue with errmsg. Also, it is better
to keep an empty line before ereport.

5. How can we provide a strict mechanism to not allow to use dsm APIs
for non-dsm FileSet? One idea could be that we can have a variable
(probably bool) in SharedFileSet structure which will be initialized
in SharedFileSetInit based on whether the caller has provided dsm
segment. Then in other DSM-based APIs, we can check if it is used for
the wrong type.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-08-16 Thread Masahiko Sawada

On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow  wrote:
>
> On Mon, Aug 16, 2021 at 5:54 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > Here is another comment:
> >
> > +char *
> > +logicalrep_message_type(LogicalRepMsgType action)
> > +{
> > ...
> > +   case LOGICAL_REP_MSG_STREAM_END:
> > +   return "STREAM END";
> > ...
> >
> > I think most the existing code use "STREAM STOP" to describe the
> > LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
> > function logicalrep_message_type() too ?
> >
>
> +1
> I think you're right, it should be "STREAM STOP" in that case.

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Allow parallel DISTINCT

2021-08-16 Thread David Rowley

On Wed, 11 Aug 2021 at 16:51, David Rowley  wrote:
> The patch is just some plumbing work to connect all the correct paths
> up to make it work. It's all fairly trivial.

I looked at this patch again and realise that it could be done a bit
better. For example, the previous version set the distinct_rel's FDW
fields twice, once when making the serial paths and once when
finalizing the partial paths.

I've now added two new functions; create_final_distinct_paths and
create_partial_distinct_paths.  The responsibility of
create_distinct_paths has changed. Instead of it creating the
non-parallel DISTINCT paths, it calls the two new functions and also
takes charge of calling the create_upper_paths_hook for
UPPERREL_DISTINCT plus the FDW GetForeignUpperPaths() call.   I think
this is nicer as I'd previously added a new parameter to
create_distinct_paths() so I could tell it not to call the hook as I
didn't want to call that twice on the same relation as it would no
doubt result in some plugin just creating the same paths again.

I've also changed my mind about the previous choice I'd made not to
call GetForeignUpperPaths for the UPPERREL_PARTIAL_DISTINCT.  I now
think that's ok.

I think this is a fairly trivial patch that just does a bit of wiring
up of paths.  Unless anyone has anything to say about it in the next
few days, I'll be looking at it again with intensions to push it.

David

Re: Skipping logical replication transactions on subscriber side

2021-08-16 Thread Masahiko Sawada

On Mon, Aug 16, 2021 at 3:59 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada  wrote:
> > I've attached the updated patches. FYI I've included the patch
> > (v8-0005) that fixes the assertion failure during shared fileset cleanup to 
> > make
> > cfbot tests happy.
>
> Hi,
>
> Thanks for the new patches.
> I have a few comments on the v8-0001 patch.

Thank you for the comments!

>
>
> 2)
> +/*
> + * Get string representing LogicalRepMsgType.
> + */
> +char *
> +logicalrep_message_type(LogicalRepMsgType action)
> +{
> ...
> +
> +   elog(ERROR, "invalid logical replication message type \"%c\"", 
> action);
> +}
>
> Some old compilers might complain that the function doesn't have a return 
> value
> at the end of the function, maybe we can code like the following:
>
> +char *
> +logicalrep_message_type(LogicalRepMsgType action)
> +{
> +   switch (action)
> +   {
> +   case LOGICAL_REP_MSG_BEGIN:
> +   return "BEGIN";
> ...
> +   default:
> +   elog(ERROR, "invalid logical replication message type 
> \"%c\"", action);
> +   }
> +   return NULL;/* keep compiler quiet */
> +}

Fixed.

>
>
> 3)
> Do we need to invoke set_apply_error_context_xact() in the function
> apply_handle_stream_prepare() to save the xid and timestamp ?

Yes. I think that v8-0001 patch already set xid and timestamp just
after parsing stream_prepare message. You meant it's not necessary?

I'll submit the updated patches soon.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: return correct error code from pgtls_init

2021-08-16 Thread Michael Paquier

On Tue, Jun 01, 2021 at 06:56:42PM -0700, Zhihong Yu wrote:
> Looking at the -1 return, e.g.
> 
> pq_lockarray = malloc(sizeof(pthread_mutex_t) *
> CRYPTO_num_locks());
> 
> when pq_lockarray is NULL. We can return errno.
> 
> I didn't change call to pthread_mutex_lock() because PGTHREAD_ERROR() is
> used which aborts.

I am not sure what you mean here, and there is nothing wrong with this
code as far as I know, as we would let the caller of pgtls_init() know
that something is wrong.
--
Michael


signature.asc
Description: PGP signature

A problem in ExecModifyTable

2021-08-16 Thread 李杰(慎追)

Hi hackers,
Recently, I noticed a great patch in pg 14.
"Rework planning and execution of UPDATE and DELETE. 
(86dc90056dfdbd9d1b891718d2e5614e3e432f35)"
This changes the DML execution of the partitioned table and makes it more 
friendly.
 But I am very confused about the following changes:
```
+   relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
+   if (relkind == RELKIND_RELATION ||
+   relkind == RELKIND_MATVIEW ||
+   relkind == RELKIND_PARTITIONED_TABLE)
{
-   charrelkind;
-   Datum   datum;
-   boolisNull;
-
-   relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-   if (relkind == RELKIND_RELATION || relkind == RELKIND_MATVIEW)
-   {
-   datum = ExecGetJunkAttribute(slot,
-junkfilter->jf_junkAttNo,
-&isNull);
-   /* shouldn't ever get a null result... */```
According to my understanding, the parent table of a partitioned table does not 
store any tuples. 
Then why is "relkind = = RELKIND_PARTITIONED_TABLE" suddenly added here ?

There is no comment on this point in the code. 
Can you answer my confusion? Be deeply grateful.

Regards & Thanks Adger

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

2021-08-16 Thread Michael Paquier

On Mon, Aug 16, 2021 at 02:06:31PM -0300, Ranier Vilela wrote:
> uint64 and size_t in 64 bits are equivalents.
> uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can
> handle 1GB.

There is usually a reason why things are done as they are, so before
suggesting changing something I would advise to read the threads that
created those changes more carefully because they could be mentioned.
In this case, we are talking about aef8948, and this part of its
related thread:
https://www.postgresql.org/message-id/20210105034739.gg7...@momjian.us

> base64.c can be made in another patch.

This file is a set of code paths used by authentication and SCRAM, it
will never get as hot as the code paths that Hans has reported as
these are one-time operations.  Please note as well cfc40d3 for the
reasons why things are handled this way.  We absolutely have to use
safe routines here.
--
Michael

signature.asc
Description: PGP signature

Re: Skipping logical replication transactions on subscriber side

2021-08-16 Thread Greg Nancarrow

On Thu, Aug 12, 2021 at 3:54 PM Masahiko Sawada  wrote:
>
> I've attached the updated patches. FYI I've included the patch
> (v8-0005) that fixes the assertion failure during shared fileset
> cleanup to make cfbot tests happy.
>

Another comment on the 0001 patch: as there is now a mix of setting
"apply_error_callback_arg" members directly and also through inline
functions, it might look better to have it done consistently with
functions having prototypes something like the following:

static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel);
static inline void reset_apply_error_context_rel(void);
static inline void set_apply_error_context_attnum(int remote_attnum);


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Some RELKIND macro refactoring

2021-08-16 Thread Michael Paquier

On Mon, Aug 16, 2021 at 10:22:50AM -0400, Alvaro Herrera wrote:
> Partitioned tables are not listed here, but IIRC there's a patch to let
> partitioned tables have a table AM so that their partitions inherit it.

This was raised in the thread for ALTER TABLE SET ACCESS METHOD (see
patch 0002):
https://www.postgresql.org/message-id/20210308010707.ga29...@telsasoft.com

I am not sure whether we'd want to do that for table AMs as
property inheritance combined with partitioned tables has always been
a sensitive topic for any properties, but if we discuss this it should
be on a new thread.
--
Michael


signature.asc
Description: PGP signature

Re: Is it worth pushing conditions to sublink/subplan?

2021-08-16 Thread Wenjing



> 2021年8月16日 17:15，Wenjing  写道：
> 
> Hi Hackers,
> 
> Recently, a issue has been bothering me, This is about conditional push-down 
> in SQL.
> I use cases from regression testing as an example.
> I found that the conditions  (B =1)  can be pushed down into the subquery, 
> However, it cannot be pushed down to sublink/subplan.
> If a sublink/subplan clause contains a partition table, it can be useful to 
> get the conditions for pruning.
> So, is it worth pushing conditions to sublink/subplan?
> Anybody have any ideas?
> 
> 
> regards,
> Wenjing
> 
> 
> example:
> create table p (a int, b int, c int) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> create table q (a int, b int, c int) partition by list (a);
> create table q1 partition of q for values in (1) partition by list (b);
> create table q11 partition of q1 for values in (1) partition by list (c);
> create table q111 partition of q11 for values in (1);
> create table q2 partition of q for values in (2) partition by list (b);
> create table q21 partition of q2 for values in (1);
> create table q22 partition of q2 for values in (2);
> insert into q22 values (2, 2, 3);
Sorry, I messed up the structure of the table.
It is should be:
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);


> 
> 
> postgres-# explain (costs off)
> postgres-# select temp.b  from 
> postgres-# (
> postgres(# select a,b from ab x where x.a = 1
> postgres(# union all 
> postgres(# (values(1,1)) 
> postgres(# ) temp,
> postgres-# ab y
> postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
> QUERY PLAN 
> ---
>  Nested Loop
>->  Seq Scan on ab_a1_b1 y
>  Filter: ((b = 1) AND (a = 1))
>->  Append
>  ->  Subquery Scan on "*SELECT* 1"
>->  Seq Scan on ab_a1_b1 x
>  Filter: ((a = 1) AND (b = 1))
>  ->  Result
> (8 rows)
> 
> The conditions  (B =1)  can be pushed down into the subquery.
> 
> postgres=# explain (costs off)
> postgres-# select
> postgres-# y.a,
> postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
> postgres-# from ab y where a = 1 and b = 1;
> QUERY PLAN 
> ---
>  Seq Scan on ab_a1_b1 y
>Filter: ((a = 1) AND (b = 1))
>SubPlan 1
>  ->  Append
>->  Seq Scan on ab_a1_b1 x_1
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b2 x_2
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b3 x_3
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b1 x_4
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b2 x_5
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b3 x_6
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b1 x_7
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b2 x_8
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b3 x_9
>  Filter: ((y.a = a) AND (y.b = b))
> (22 rows)
> 
> The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in 
> targetlist.
> 
> postgres=# explain (costs off)
> postgres-# select y.a
> postgres-# from ab y 
> postgres-# where
> postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
> postgres-# y.a = 1 and y.b = 1;
> QUERY PLAN 
> ---
>  Seq Scan on ab_a1_b1 y
>Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
>SubPlan 1
>  ->  Append
>->  Seq Scan on ab_a1_b1 x_1
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b2 x_2
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b3 x_3
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b1 x_4
>  Filter:

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

2021-08-16 Thread Michael Paquier

On Sun, Aug 15, 2021 at 02:58:59PM +, Hans Buschmann wrote:
> If it seems useful somebody could enter it as an open item /
> resolved item for pg14 after beta 3.

Hmm.  Using SQLs like the following to stress pg_hex_encode(), I can
see a measurable difference easily, so no need of pg_dump or an
instance with many LOs:
set work_mem = '1GB';
with hex_values as materialized
  (select repeat(i::text, N)::bytea as i
 from generate_series(1, M) t(i))
  SELECT count(encode(i, 'hex')) from hex_values;

With N = 1000, M = 10, perf shows a 34.88% use of pg_hex_encode().
On REL_13_STABLE, I get down ~27% for hex_encode(), the same backend
routine.

The runtime numbers are more interesting, HEAD gets up to 1600ms.
With the latest version of the patch, we get down to 1550ms.  Now, a
simple revert of ccf4e277 and aef8948 brings me down to the same
runtimes as ~13, closer to 1450ms.  There seem to be some noise in the
tests, but the difference is clear.

Considering that commit aef8948 also involved a cleanup of
src/common/hex_decode.c, I think that it would be saner to also revert
c3826f83, so as we have a clean state of the code to work on should
the work on the data encryption patch set move on in the future.  It
is not decided that this would actually use any of the hex
decode/encode code, either.

Honestly, I don't like much messing with this stuff after beta3, and
one of the motives of moving the hex handling code to src/common/ was
for the data encryption code whose state is not known as of late.
This does not prevent working on such refactorings in the future, of
course, keeping the performance impact in mind.

In short, I am planning to address this regression with the attached,
for a combined revert of 0d70d30, 5c33ba5 and 92436a7.
--
Michael
From 55ca0af0d5e40a57b468dac21c0b1ad294a96df5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 17 Aug 2021 10:57:02 +0900
Subject: [PATCH] Revert refactoring of hex code to src/common/

This is a combined revert of aef8948, ccf4e27, c3826f8.
---
 src/include/common/hex.h  |  25 ---
 src/include/common/sha2.h |   4 +
 src/include/utils/builtins.h  |   4 +
 src/backend/replication/backup_manifest.c |  30 ++--
 src/backend/utils/adt/encode.c| 158 +++---
 src/backend/utils/adt/varlena.c   |  15 +-
 src/common/Makefile   |   1 -
 src/common/hex.c  | 192 --
 src/tools/msvc/Mkvcbuild.pm   |   2 +-
 9 files changed, 127 insertions(+), 304 deletions(-)
 delete mode 100644 src/include/common/hex.h
 delete mode 100644 src/common/hex.c

diff --git a/src/include/common/hex.h b/src/include/common/hex.h
deleted file mode 100644
index 150771a14d..00
--- a/src/include/common/hex.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*
- *
- *	hex.h
- *	  Encoding and decoding routines for hex strings.
- *
- *	Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
- *	Portions Copyright (c) 1994, Regents of the University of California
- *
- * IDENTIFICATION
- *		  src/include/common/hex.h
- *
- *
- */
-
-#ifndef COMMON_HEX_H
-#define COMMON_HEX_H
-
-extern uint64 pg_hex_decode(const char *src, size_t srclen,
-			char *dst, size_t dstlen);
-extern uint64 pg_hex_encode(const char *src, size_t srclen,
-			char *dst, size_t dstlen);
-extern uint64 pg_hex_enc_len(size_t srclen);
-extern uint64 pg_hex_dec_len(size_t srclen);
-
-#endif			/* COMMON_HEX_H */
diff --git a/src/include/common/sha2.h b/src/include/common/sha2.h
index dfeee6bceb..f4bae35af1 100644
--- a/src/include/common/sha2.h
+++ b/src/include/common/sha2.h
@@ -18,11 +18,15 @@
 /*** SHA224/256/384/512 Various Length Definitions ***/
 #define PG_SHA224_BLOCK_LENGTH			64
 #define PG_SHA224_DIGEST_LENGTH			28
+#define PG_SHA224_DIGEST_STRING_LENGTH	(PG_SHA224_DIGEST_LENGTH * 2 + 1)
 #define PG_SHA256_BLOCK_LENGTH			64
 #define PG_SHA256_DIGEST_LENGTH			32
+#define PG_SHA256_DIGEST_STRING_LENGTH	(PG_SHA256_DIGEST_LENGTH * 2 + 1)
 #define PG_SHA384_BLOCK_LENGTH			128
 #define PG_SHA384_DIGEST_LENGTH			48
+#define PG_SHA384_DIGEST_STRING_LENGTH	(PG_SHA384_DIGEST_LENGTH * 2 + 1)
 #define PG_SHA512_BLOCK_LENGTH			128
 #define PG_SHA512_DIGEST_LENGTH			64
+#define PG_SHA512_DIGEST_STRING_LENGTH	(PG_SHA512_DIGEST_LENGTH * 2 + 1)

 #endif			/* _PG_SHA2_H_ */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index f3ce4fb173..40fcb0ab6d 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -31,6 +31,10 @@ extern void domain_check(Datum value, bool isnull, Oid domainType,
 extern int	errdatatype(Oid datatypeOid);
 extern int	errdomainconstraint(Oid datatypeOid, const char *conname);

+/* encode.c */
+extern uint64 hex_encode(const char *src, size

Re: Added schema level support for publication.

2021-08-16 Thread Peter Smith

On Mon, Aug 16, 2021 at 11:31 PM Tom Lane  wrote:
>
> Peter Smith  writes:
> > Then the question from Peter E. [2] "Why can't I have a publication
> > that publishes tables t1, t2, t3, *and* schemas s1, s2, s3." would
> > have an intuitive solution like:
>
> > CREATE PUBLICATION pub1
> > FOR TABLE t1,t2,t3 AND
> > FOR ALL TABLES IN SCHEMA s1,s2,s3;
>
> That seems a bit awkward, since the existing precedent is
> to use commas.  We shouldn't need more than one FOR noise-word,
> either.  So I was imagining syntax more like, say,

When I wrote that "AND" suggestion I had in mind that commas may get
weird if there were objects with keyword names. e.g. if there was a
schema called SEQUENCE and a SEQUENCE called  SEQUENCE then this will
be allowed.

CREATE PUBLICATION pub1 FOR ALL TABLES IN SCHEMA SEQUENCE, SEQUENCE SEQUENCE;

But probably I was just overthinking it.

>
> CREATE PUBLICATION pub1 FOR
>   TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2,
>   SEQUENCE seq1,seq2, ALL SEQUENCES IN SCHEMA s3,s4;
>
> Abstractly it'd be
>
> createpub := CREATE PUBLICATION pubname FOR cpitem [, ... ] [ WITH ... ]
>
> cpitem := ALL TABLES |
>   TABLE name |
>   ALL TABLES IN SCHEMA name |
>   ALL SEQUENCES |
>   SEQUENCE name |
>   ALL SEQUENCES IN SCHEMA name |
>   name
>
> The grammar output would need some post-analysis to attribute the
> right type to bare "name" items, but that doesn't seem difficult.

That last bare "name" cpitem. looks like it would permit the following syntax:

CREATE PUBLICATION pub FOR a,b,c;

Was that intentional?

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: The Free Space Map: Problems and Opportunities

2021-08-16 Thread Peter Geoghegan

On Mon, Aug 16, 2021 at 5:15 PM Peter Geoghegan  wrote:
> Another concern is knock on effects *after* any initial problematic
> updates -- that's certainly not where it ends. Every action has
> consequences, and these consequences themselves have consequences. By
> migrating a row from an earlier block into a later block (both
> physically and temporally early/late), we're not just changing things
> for that original order row or rows (the order non-HOT-updated by the
> delivery transaction).

To be clear, TPC-C looks like this: each order row and each order line
row will be inserted just once, and then later updated just once. But
that's it, forever -- no more modifications. Both tables grow and grow
for as long as the benchmark runs. Users with workloads like this will
naturally expect that performance will steady over time. Even over
days or even weeks. But that's not what we see.

What we actually see is that the FSM can never quite resist the
temptation to dirty older pages that should be aging out. And so
performance degrades over days and weeks -- that's how long Jan has
said that it can take.

The FSM does have a bias in favor of using earlier pages in favor of
later pages, in order to make relation truncation by VACUUM more
likely in general. That bias certainly isn't helping us here, and
might be another thing that hurts us. I think that the fundamental
problem is that the FSM just doesn't have any way of recognizing that
it's behavior is penny wise, pound foolish. I don't believe that there
is any fundamental reason why Postgres can't have *stable* long term
performance for this workload in the way that it's really expected to.
That seems possible within the confines of the existing design for HOT
and VACUUM, which already work very well for the first few hours.

-- 
Peter Geoghegan

Re: The Free Space Map: Problems and Opportunities

2021-08-16 Thread Peter Geoghegan

On Mon, Aug 16, 2021 at 3:49 PM Bruce Momjian  wrote:
> OK, here is my feedback.  First, I understand the space
> reservation/soccer idea, but I am also concerned that adding space
> reservation could improve some things and make others worse.

That is definitely a legitimate concern.

There is a trade-off here: if we do too much preallocation, there may
be cases where the preallocated space that we expected to be used
quickly is never used by anybody. But that doesn't seem like a
problem, provided we don't lose track of the fact that it happened.
Then the worst that can happen is that we have empty pages that nobody
will ever use, because nobody inserts into the table ever again (not
the backend with the leaked space lease, not anybody else). That seems
manageable. We can add some kind of ramp-up behavior that gets more
and more aggressive as demand for new space from backends is steady or
increasing.

> Second, let's look at a concrete example, and see how it can be improved
> more simply.  As I understand it, there are three operations that need
> free space:
>
> 1.  INSERT/COPY
> 2.  UPDATE where there is insufficient space on the page of the
> old row
> 3.  UPDATE where there is sufficient page space

Right.

> The third option already can use the existing page for a HOT update, so
> the FSM doesn't matter.

I agree.

All good questions. Thank you for diving in on this.

> For 1 & 2, suppose you have a table with 10 8k
> pages, all 80% full.  As I understand it, the first page will go from
> 80% to 81%, then the next row addition will take the second page from
> 80% to 81%, until all pages are 81%, and then it starts over again.  Is
> that accurate?

No. Generally backends have their own target block (a simple
BlockNumber) that they store in the relcache, one per table recently
modified. Backends/heapam uses RelationGetTargetBlock() to access this
local cache. So there is "backend affinity" for particular individual
heap pages, even today. However, that still has many problems.

It doesn't make sense to have a local cache for a shared resource --
that's the problem. You actually need some kind of basic locking or
lease system, so that 10 backends don't all decide at the same time
that one particular heap block is fully empty, and therefore a good
target block for that backend alone. It's as if the backends believe
that they're special little snowflakes, and that no other backend
could possibly be thinking the same thing at the same time about the
same heap page. And so when TPC-C does its initial bulk insert,
distinct orders are already shuffled together in a hodge-podge, just
because concurrent bulk inserters all insert on the same heap pages.

If we could safely give out 10 or 50 pages directly to each backend,
then clearly the related data would be stored together in this
scenario -- each backend would be able to work through its lease of
contiguous heap pages for quite a while before revisiting the
FSM/relation extension. The trick is to teach the implementation to do
that without allowing the backend to permanently leak whole entire
leases with maybe dozens of free pages.

Systems like DB2 and Oracle probably can't have this problem. Even the
bulk lease case is just an extension of something they had to do
anyway. You see, some kind of FSM lease system that knows about
transaction lifetime for any given piece of free space is strictly
necessary with UNDO based rollback. Without that, transaction rollback
breaks in certain edge cases involving concurrent inserts into the
same page, right before a rollback that needs to put an updated
version back in place. If the updated version from undo happens to be
physically larger than the now-aborted successor version, and if there
is little or no space left to cover that, what can rollback do about
it? Nothing. It's a nasty bug. They use heavyweight locks to avoid
this.

> Is that the non-temporal memory issue you are concerned
> about?

When I talk about memory or time, what I'm usually referring to is the
ability to manage larger allocations of multiple blocks for a while
after they're originally requested. So that a requesting
transaction/backend is given the opportunity to use the space that
they asked for. They shouldn't be completely trusted. We should trust
but verify.

> Doesn't spreading the new rows out increase the ability to do
> HOT updates?

It's true that the problem scenario with TPC-C does involve updates,
and it's true that that's when things really go badly. But I
deliberately haven't gone into the role that the updates play there
too much. Not just yet.

It's important to note that the problems really do start when we
insert, even if that's not obvious -- that's when the locality is
lost, right when the original order transaction comes in. We lose
locality just because we have concurrent inserters into the same
table, where each transaction inserts multiple related rows. We fail
to group related rows

Re: archive status ".ready" files may be created too early

2021-08-16 Thread alvhe...@alvh.no-ip.org

So why do we call this structure "record boundary map", when the
boundaries it refers to are segment boundaries?  I think we should call
it "segment boundary map" instead ... and also I think we should use
that name in the lock that protects it, so instead of ArchNotifyLock, it
could be SegmentBoundaryLock or perhaps WalSegmentBoundaryLock.

The reason for the latter is that I suspect the segment boundary map
will also have a use to fix the streaming replication issue I mentioned
earlier in the thread.  This also makes me think that we'll want the wal
record *start* address to be in the hash table too, not just its *end*
address.  So we'll use the start-1 as position to send, rather than the
end-of-segment when GetFlushRecPtr() returns that.

As for 0x, I think it would be cleaner to do a
#define MaxXLogSegNo with that value in the line immediately after
typedef XLogSegNo, rather than use the numerical value directly in the
assignment.

Typo in comment atop RemoveRecordBoundariesUpTo: it reads "up to an",
should read "up to and".

I think the API of GetLatestRecordBoundarySegment would be better by
returning the boolean and having the segment as out argument.  Then you
could do the caller more cleanly,

if (GetLatestRecordBoundarySegment(last_notified, flushed, 
&latest_boundary_segment))
{
   SetLastNotified( ... );
   RemoveRecordBoundaries( ... );
   LWLockRelease( ... );
   for (..)
 XLogArchiveNotifySeg(...);
   PgArchWakeup();
}
else
   LWLockRelease(...);

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)

Re: Re[5]: On login trigger: take three

2021-08-16 Thread Greg Nancarrow

On Tue, Aug 17, 2021 at 1:11 AM Ivan Panchenko  wrote:
>
> Does it mean that "RAISE NOTICE" should’t be used, or behaves unexpectedly in 
> logon triggers? Should we mention this in the docs?
>

No I don't believe so, it was just that that part of the test
framework (sub poll_query_until) had been changed to regard anything
output to stderr as an error (so now for the test to succeed, whatever
is printed to stdout must match the expected test output, and stderr
must be empty).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Allow composite foreign keys to reference a superset of unique constraint columns?

2021-08-16 Thread David G. Johnston

On Mon, Aug 16, 2021 at 4:37 PM Paul Martinez  wrote:

>
> It seems like a somewhat useful feature. If people think it would be
> useful to
> implement, I might take a stab at it when I have time.
>
>
This doesn't seem useful enough for us to be the only implementation to go
above and beyond the SQL Standard's specification for the references
feature (I assume that is what this proposal suggests).

This example does a good job of explaining but its assumptions aren't that
impactful and thus isn't that good at inducing desirability.

David J.

Allow composite foreign keys to reference a superset of unique constraint columns?

2021-08-16 Thread Paul Martinez

Hey, hackers!

While working with some foreign keys, I noticed some mildly unexpected
behavior. The columns referenced by a unique constraint must naturally
have a unique constraint on them:

CREATE TABLE foo (a integer);
CREATE TABLE bar (x integer REFERENCES foo(a));
> ERROR:  there is no unique constraint matching given keys for referenced
  table "foo"

But Postgres doesn't allow a foreign key to reference a set of columns
without a unique constraint, even if there a unique constraint on a
subset of those columns (i.e., it doesn't allow referencing a superset
of a unique constraint).

CREATE TABLE foo (a integer PRIMARY KEY, b integer);
CREATE TABLE bar (x integer, y integer, FOREIGN KEY (x, y) REFERENCES
foo(a, b));
> ERROR:  there is no unique constraint matching given keys for referenced
  table "foo"

It seems to me like there would be nothing wrong in this case to allow this
foreign key constraint to exist. Because there is a unique constraint on foo(a),
foo(a, b) will also be unique. And it doesn't seem like it would be too complex
to implement.

Neither MATCH SIMPLE nor MATCH FULL constraints would have any issues
with this. MATCH PARTIAL may, but, alas, it's not implemented. (I've had
a few ideas about foreign keys, and MATCH PARTIAL seems to always come
up, and I still don't understand what its use case is.)




A real-world use case that uses denormalization could run into this. Imagine a
naive music database that has a list of artists, albums, and songs, where each
album is by one artist and each song is on one album, but we still store a
reference to the artist on each song:

CREATE TABLE artists (id serial PRIMARY KEY, name text);
CREATE TABLE albums (id serial PRIMARY KEY, artist_id REFERENCES
artists(id) name text);
CREATE TABLE songs (
  id serial PRIMARY KEY,
  artist_id REFERENCES artists(id) ON DELETE CASCADE,
  album_id REFERENCES albums(id) ON DELETE CASCADE,
  name text,
);

To ensure that artist deletions are fast, we need to create an index on
songs(artist_id) and songs(album_id). But, suppose we wanted to save on index
space, and we never needed to query JUST by album_id. We could then do:

CREATE TABLE songs (
  id serial PRIMARY KEY,
  artist_id REFERENCES artists(id) ON DELETE CASCADE,
  album_id integer,
  name text,
  FOREIGN KEY (artist_id, album_id) REFERENCES albums(artist_id, id)
ON DELETE CASCADE
);

And then we could have a single index on songs(artist_id, album_id) that would
serve both ON CASCADE DELETE triggers:

-- Delete artist
DELETE FROM songs WHERE artist_id = ;
-- Delete artist
DELETE FROM songs
  WHERE artist_id =  AND album_id = ;

But Postgres wouldn't let us create the composite foreign key described.



It seems like a somewhat useful feature. If people think it would be useful to
implement, I might take a stab at it when I have time.

- Paul

Re: The Free Space Map: Problems and Opportunities

2021-08-16 Thread Bruce Momjian

On Mon, Aug 16, 2021 at 10:35:45AM -0700, Peter Geoghegan wrote:
> I have suspected that there are serious design issues with the FSM
> (and related heapam code like hio.c) for several years now [1]. This
> has a lot to do with the experience of using Jan Wieck's BenchmarkSQL
> implementation of TPC-C [2][3][4]. It clearly makes Postgres exhibit
> pathological performance issues, especially heap fragmentation.

OK, here is my feedback.  First, I understand the space
reservation/soccer idea, but I am also concerned that adding space
reservation could improve some things and make others worse.

Second, let's look at a concrete example, and see how it can be improved
more simply.  As I understand it, there are three operations that need
free space:

1.  INSERT/COPY
2.  UPDATE where there is insufficient space on the page of the
old row
3.  UPDATE where there is sufficient page space

The third option already can use the existing page for a HOT update, so
the FSM doesn't matter.   For 1 & 2, suppose you have a table with 10 8k
pages, all 80% full.  As I understand it, the first page will go from
80% to 81%, then the next row addition will take the second page from
80% to 81%, until all pages are 81%, and then it starts over again.  Is
that accurate?  Is that the non-temporal memory issue you are concerned
about?  Doesn't spreading the new rows out increase the ability to do
HOT updates?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Justin Pryzby

On Mon, Aug 16, 2021 at 05:42:48PM -0400, Álvaro Herrera wrote:
> On 2021-Aug-16, Álvaro Herrera wrote:
> 
> > Here's the reversal patch for the 14 branch.  (It applies cleanly to
> > master, but the unused member of PgStat_StatTabEntry needs to be
> > removed and catversion bumped).
> 
> I have pushed this to both branches.  (I did not remove the item from
> the release notes in the 14 branch.)

|I retained the addition of relkind 'p' to tables included by
|pg_stat_user_tables, because reverting that would require a catversion
|bump.

Right now, on v15dev, it shows 0, which is misleading.
Shouldn't it be null ?

analyze_count   | 0

Note that having analyze_count and last_analyze would be an an independently
useful change.  Since parent tables aren't analyzed automatically, I have a
script to periodically process them if they weren't processed recently.  Right
now, for partitioned tables, the best I could find is to check its partitions:
| MIN(last_analyzed) FROM pg_stat_all_tables psat JOIN pg_inherits i ON 
psat.relid=i.inhrelid

In 20200418050815.ge26...@telsasoft.com I wrote:
|This patch includes partitioned tables in pg_stat_*_tables, which is great; I
|complained awhile ago that they were missing [0].  It might be useful if that
|part was split out into a separate 0001 patch (?).
| [0] 
https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com

-- 
Justin

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Álvaro Herrera

On 2021-Aug-16, Álvaro Herrera wrote:

> Here's the reversal patch for the 14 branch.  (It applies cleanly to
> master, but the unused member of PgStat_StatTabEntry needs to be
> removed and catversion bumped).

I have pushed this to both branches.  (I did not remove the item from
the release notes in the 14 branch.)

It upsets me to have reverted it, but after spending so much time trying
to correct the problems, I believe it just wasn't salvageable within the
beta-period code freeze constraints.  I described the issues I ran into
in earlier messages; I think a good starting point to re-develop this is
to revert the reversal commit, then apply my patch at
https://postgr.es/m/0794d7ca-5183-486b-9c5e-6d434867c...@www.fastmail.com
then do something about the remaining problems that were complained
about.  (Maybe: add an "ancestor OID" member to PgStat_StatTabEntry so
that the collector knows to propagate counts from children to ancestors
when the upd/ins/del counts are received.  However, consider developing
it as follow-up to Horiguchi-san's shmem pgstat rather than current
pgstat implementation.)

Thanks

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: Reducing memory consumption for pending inval messages

2021-08-16 Thread Tom Lane

"Bossart, Nathan"  writes:
> On 5/30/21, 10:22 AM, "Tom Lane"  wrote:
>> We can do a lot better, by exploiting what we know about the usage
>> patterns of invalidation requests.

> I spent some time looking through this patch, and it seems reasonable
> to me.

Thanks for reviewing!

>> There is one notable new assumption I had to make for this.  At end
>> of a subtransaction, we have to merge its inval events into the
>> "PriorCmd" list of the parent subtransaction.  (It has to be the
>> PriorCmd list, not the CurrentCmd list, because these events have
>> already been processed locally; we don't want to do that again.)
>> This means the parent's CurrentCmd list has to be empty at that
>> instant, else we'd be trying to merge sublists that aren't adjacent
>> in the array.  As far as I can tell, this is always true: the patch's
>> check for it doesn't trigger in a check-world run.  And there's an
>> argument that it must be true for semantic consistency (see comments
>> in patch).  So if that check ever fails, it probably means there is a
>> missing CommandCounterIncrement somewhere.  Still, this could use more
>> review and testing.

> I didn't discover any problems with this assumption in my testing,
> either.  Perhaps it'd be good to commit something like this sooner in
> the v15 development cycle to maximize the amount of coverage it gets.

Yeah, that's a good point.  I'll go push this.

regards, tom lane

Re: Reducing memory consumption for pending inval messages

2021-08-16 Thread Bossart, Nathan

On 5/30/21, 10:22 AM, "Tom Lane"  wrote:
> We can do a lot better, by exploiting what we know about the usage
> patterns of invalidation requests.  New requests are always added to
> the latest sublist, and the only management actions are (1) merge
> latest sublist into next-to-latest sublist, or (2) drop latest
> sublist, if a subtransaction aborts.  This means we could perfectly
> well keep all the requests in a single, densely packed array in
> TopTransactionContext, and replace the "list" control structures
> with indexes into that array.  The attached patch does that.

I spent some time looking through this patch, and it seems reasonable
to me.

> There is one notable new assumption I had to make for this.  At end
> of a subtransaction, we have to merge its inval events into the
> "PriorCmd" list of the parent subtransaction.  (It has to be the
> PriorCmd list, not the CurrentCmd list, because these events have
> already been processed locally; we don't want to do that again.)
> This means the parent's CurrentCmd list has to be empty at that
> instant, else we'd be trying to merge sublists that aren't adjacent
> in the array.  As far as I can tell, this is always true: the patch's
> check for it doesn't trigger in a check-world run.  And there's an
> argument that it must be true for semantic consistency (see comments
> in patch).  So if that check ever fails, it probably means there is a
> missing CommandCounterIncrement somewhere.  Still, this could use more
> review and testing.

I didn't discover any problems with this assumption in my testing,
either.  Perhaps it'd be good to commit something like this sooner in
the v15 development cycle to maximize the amount of coverage it gets.

Nathan

Re: badly calculated width of emoji in psql

2021-08-16 Thread John Naylor

On Mon, Aug 16, 2021 at 1:04 PM Jacob Champion  wrote:
>
> On Mon, 2021-08-16 at 11:24 -0400, John Naylor wrote:
> >
> > On Sun, Aug 15, 2021 at 10:45 PM Michael Paquier 
wrote:
> >
> > > How large do libpgcommon deliverables get with this patch?  Skimming
> > > through the patch, that just looks like a couple of bytes, still.
> >
> > More like a couple thousand bytes. That's because the width
> > of mbinterval doubled. If this is not desirable, we could teach the
> > scripts to adjust the width of the interval type depending on the
> > largest character they saw.
>
> True. Note that the combining character table currently excludes
> codepoints outside of the BMP, so if someone wants combinations in
> higher planes to be handled correctly in the future, the mbinterval for
> that table may have to be widened anyway.

Hmm, somehow it escaped my attention that the combining character table
script explicitly excludes those. There's no comment about it. Maybe best
to ask Peter E. (CC'd)

Peter, does the combining char table exclude values > 0x for size
reasons, correctness, or some other consideration?

--
John Naylor
EDB: http://www.enterprisedb.com

Re: when the startup process doesn't (logging startup delays)

2021-08-16 Thread Robert Haas

On Sat, Aug 14, 2021 at 5:47 PM Justin Pryzby  wrote:
> Should this feature distinguish between crash recovery and archive recovery on
> a hot standby ?  Otherwise the standby will display this all the time.
>
> 2021-08-14 16:13:33.139 CDT startup[11741] LOG:  redo in progress, elapsed 
> time: 124.42 s, current LSN: 0/EEE2100
>
> If so, I think maybe you'd check !InArchiveRecovery (but until Robert finishes
> cleanup of xlog.c variables, I can't say that with much confidence).

Hmm. My inclination is to think that on an actual standby, you
wouldn't want to get messages like this, but if you were doing a
point-in-time-recovery, then you would. So I think maybe
InArchiveRecovery is not the right thing. Perhaps StandbyMode?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

ALTER TYPE vs extension membership (was Re: BUG #17144: Upgrade from v13 to v14 with the cube extension failed)

2021-08-16 Thread Tom Lane

I wrote:
> Hm, thanks.  This does not seem to be a problem with pg_upgrade per se;
> you can reproduce it with

> regression=# CREATE EXTENSION cube VERSION '1.4';
> CREATE EXTENSION
> regression=# CREATE EXTENSION earthdistance;
> CREATE EXTENSION
> regression=# ALTER EXTENSION "cube" UPDATE;
> ERROR:  type earth is already a member of extension "earthdistance"

> [ experiments a bit more ]  It might just be directly-dependent types.
> If I create a table column of type cube, then nothing strange happens
> during the upgrade.  But if I create a domain over cube, then do the
> update, the domain gets absorbed into the extension.  That'd be kind
> of annoying :-(

So the problem is that ALTER TYPE SET recurses to dependent domains,
as it must, but it is not careful about what that implies for extension
membership.  It looks like we need to extend GenerateTypeDependencies
so that it knows not to mess with extension dependencies when doing
an ALTER.

There's a policy question here, which is when does an operation on
a pre-existing object within an extension script cause the object
to be absorbed into the extension?  You might naively say "never",
but that's not our historical behavior, and I think it'd clearly
be the wrong thing in some cases.  For example, consider

CREATE TYPE cube;   -- make a shell type
-- do something that requires type cube to exist
CREATE EXTENSION cube;

We don't want this to fail, because it might be necessary to do
things that way to get out of circular dependencies.  On the other
hand, the formerly-shell type had certainly better wind up owned
by the extension.

The general policy as the code stands seems to be that CREATE OR
REPLACE-style operations will absorb any replaced object into
the extension.  IMO extension scripts generally shouldn't use
CREATE OR REPLACE unless they're sure they already have such an
object; but if one does use such a command, I think this behavior
is reasonable.

The situation is a lot less clear-cut for ALTER commands, though.
Again, it's dubious that an extension should ever apply ALTER to
an object that it doesn't know it already owns; but if it does,
should that result in absorbing the object?  I'm inclined to think
not, so the attached patch just causes AlterTypeRecurse and
AlterDomainDefault to never change the object's extension membership.
That's more behavioral change than is strictly needed to fix the
bug at hand, but I think it's a consistent definition.

I looked around for other places that might have similar issues,
and the only one I could find (accepting that CREATE OR REPLACE
should work this way) is that ALTER OPERATOR ... SET applies
makeOperatorDependencies, which has the same sort of behavior as
GenerateTypeDependencies does.  I'm inclined to think that for
consistency, we should make that case likewise not change extension
membership; but I've not done anything about it in the attached.

Another point that perhaps deserves discussion is whether it's
okay to change the signature of GenerateTypeDependencies in
stable branches (we need to fix this in v13 not only v14/HEAD).
I can't see a good reason for extensions to be calling that,
and codesearch.debian.net knows of no outside callers, so I'm
inclined to just change it.  If anyone thinks that's too risky,
we could do something with a wrapper function in v13.

Comments?

regards, tom lane

diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 10f3119670..07bcdc463a 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -179,6 +179,13 @@ recordMultipleDependencies(const ObjectAddress *depender,
  * existed), so we must check for a pre-existing extension membership entry.
  * Passing false is a guarantee that the object is newly created, and so
  * could not already be a member of any extension.
+ *
+ * Note: isReplace = true is typically used when updating a object in
+ * CREATE OR REPLACE and similar commands.  The net effect is that if an
+ * extension script uses such a command on a pre-existing free-standing
+ * object, the object will be absorbed into the extension.  If the object
+ * is already a member of some other extension, the command will fail.
+ * This behavior is desirable for cases such as replacing a shell type.
  */
 void
 recordDependencyOnCurrentExtension(const ObjectAddress *object,
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index 6f9b5471da..cdce22f394 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -167,6 +167,7 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId)
  0,
  false,
  false,
+ true,	/* make extension dependency */
  false);

 	/* Post creation hook for new shell type */
@@ -504,6 +505,7 @@ TypeCreate(Oid newTypeOid,
  relationKind,
  isImplicitArray,
  isDependentType,
+ true,	/* make extension dependency */

Re: [PATCH] Native spinlock support on RISC-V

2021-08-16 Thread Tom Lane

Daniel Gustafsson  writes:
> There didn’t seem to be anything left here (at least until there is hardware 
> in
> the buildfarm) so I took the liberty to close it as committed in the CF app.

Ah, sorry, I did not realize there was a CF entry.

regards, tom lane

Re: [PATCH] Native spinlock support on RISC-V

2021-08-16 Thread Daniel Gustafsson

> On 13 Aug 2021, at 20:09, Tom Lane  wrote:

> ..I split it off to its own code block and pushed it.

There didn’t seem to be anything left here (at least until there is hardware in
the buildfarm) so I took the liberty to close it as committed in the CF app.

--
Daniel Gustafsson   https://vmware.com/

Re: Emit namespace in post-copy output

2021-08-16 Thread Daniel Gustafsson

> On 28 Jul 2021, at 16:15, Daniel Gustafsson  wrote:

> I took a look at this I agree with the reviewer that it's a good change.

Pushed to master, thanks!

--
Daniel Gustafsson   https://vmware.com/

The Free Space Map: Problems and Opportunities

2021-08-16 Thread Peter Geoghegan

I have suspected that there are serious design issues with the FSM
(and related heapam code like hio.c) for several years now [1]. This
has a lot to do with the experience of using Jan Wieck's BenchmarkSQL
implementation of TPC-C [2][3][4]. It clearly makes Postgres exhibit
pathological performance issues, especially heap fragmentation.

There is a clear way in which free space in the two largest tables
(orders and order lines) ought to be managed, given the fixed pattern
of the workload, but that isn't what we see. I had the opportunity to
work with Jan Wieck and Greg Smith directly on this, shortly before I
left Crunchy Data several weeks ago. Jan was able to fix some issues
on the BenchmarkSQL side, which helped a lot. But the real problems
remain. It is generally understood that the FSM leads to Postgres
doing badly with TPC-C. I personally believe that this is
representative of real practical problems, and not just a problem with
this one benchmark.

This email is an attempt to impose some order on a disorderly problem
space. I'd like to compare notes with other people that are interested
in the problem. I suspect that some experienced hackers have yet to be
convinced of the importance of the FSM when it comes to bloat.
Hopefully I'll manage to make that case convincingly on this thread.

If any of my more concrete claims about problems in the FSM seem like
they need to be justified, please let me know. I am omitting details
in this initial overview email for the sake of brevity. It's already
too long.

Problems


The FSM gives out space without considering the passage of time, or
the likelihood that a particular transaction or client will consume
its requested free space in a more or less predictable and steady way
over time. It has no memory. The FSM therefore fails to capture
naturally occurring locality, or at least doesn't specifically care
about it. This is the central problem, that all other problems with
the FSM seem to stem from in one way or another.

The FSM should group related rows (e.g. rows from the same transaction
or backend) together, so that they reliably go on the same heap page
-- careless mixing of unrelated rows should be avoided. When I say
"related", I mean related in whatever sense the application or end
user thinks of them as related. As a general rule we should expect
groups of rows that were inserted by the same transaction to also be
read, updated, deleted, or removed by VACUUM together, as a group.
While this isn't guaranteed, it's a good working assumption for the
FSM. It avoids unnecessary fragmentation. The current FSM design
causes significant fragmentation with workloads where users *expect*
no fragmentation at all. I see this with TPC-C. The way that the FSM
*ought* to behave for the workload in question is intuitively obvious,
once you lay it all out. But that's not what we see in practice -- far
from it. (Happy to go into more detail on this.)

You don't even need temporal locality to see problems. Even random
inserts show up certain FSM implementation problems. The FSM lacks
even a basic understanding of the *aggregate* result of backends
applying various FSM heuristics over time, and with concurrent access.
Inserting backends currently behave like an amateur soccer team where
every individual soccer player chases the ball, without any
coordination among team members. The players in this analogy somehow
fail to consider where the ball *is about to be* -- there is no
temporal awareness. They also miss the importance of cooperating with
each other as a team -- there is also no spatial awareness, and no
thought given to second order effects. This leads to increased buffer
lock contention and fragmentation.

(Other known FSM problems have been omitted for the sake of brevity.)

Background
--

To recap, the FSM tracks how much free space is in every heap page.
Most FSM maintenance takes place during VACUUM, though ordinary
connection backends occasionally help to keep the FSM current, via
calls to RecordAndGetPageWithFreeSpace() made from hio.c.

There is also a bulk extension mechanism added by commit 719c84c1be,
which is used when we detect multiple concurrent inserters. This bulk
extension mechanism does change things a little (it gives some thought
to systematic effects), though it still has the same basic issues: it
doesn't do nearly enough to group likely-related rows together. I
suspect that improving the bulk extension mechanism will be an
important part of improving the overall picture. That mechanism needs
to be more explicit, and more careful about who gets what free space
when. I'll go into the significance of this mechanism to the FSM
below, under "Opportunities". But first, more background information.

Papers
--

I've taken time to review the database research literature, which
doesn't have all that much to say here. But there are a couple of
relevant classic papers that I know of [6][7].

A lot of the considerations for free space

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Álvaro Herrera

Another possible problem is that before the revert, we accept
ALTER TABLE some_partitioned_table SET (autovacuum_enabled=on/off);
(also autovacuum_analyze_scale_factor and autovacuum_analyze_threshold)
but after the revert this is will throw a syntax error.  What do people
think we should do about that?

1. Do nothing.  If somebody finds in that situation, they can use
  ALTER TABLE .. RESET ...
  to remove the settings.

2. Silently accept the option and do nothing.
3. Accept the option and throw a warning that it's a no-op.
4. Something else

Opinions?

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

2021-08-16 Thread Ranier Vilela

Em seg., 16 de ago. de 2021 às 13:19, Hans Buschmann 
escreveu:

> --
> *Von:* Ranier Vilela 
> *Gesendet:* Montag, 16. August 2021 17:04
> *An:* Hans Buschmann
> *Cc:* pgsql-hack...@postgresql.org
> *Betreff:* Re: PG14: Avoid checking output-buffer-length for every
> encoded byte during pg_hex_encode
>
> Hello Ranier,
>
> Thank you for your quick response.
>
> >Is always good to remove immutables from loops, if safe and secure.
>
> I think it's  worse because a loop changed variable is involved in the
> test, so it seems to be not immutable.
>
> >Decode has regression too.
>
> good catch, I overlooked it.
>
> >I think that we can take one more step here.
> >pg_hex_decode can be improved too.
>
> +1
>
> By looking a bit more precisely I detected the same checks in
> common/base64.c also involving loop-changed variables. Could you please
> make the same changes to base64.c?
>
I will take a look.


>
> >We can remove limitations from all functions that use hex functions.
> >byteaout from (varlena.c) has an artificial limitation to handle only
> MaxAllocSize length, but
> >nothing prevents her from being prepared for that limitation to be
> removed one day.
>
> +1, but I don't know all implications of the type change to size_t
>
uint64 and size_t in 64 bits are equivalents.
uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can
handle 1GB.


>
> >Please, take a look at the new version attached.
>
> Two remarks for decoding (by eyeball inspection of patch file only
> because of still struggling with git etc.):
>
> 1. The odd-number-of-digits-check for decoding is still part of the loop.
> It should be before the loop and called only once (by testing for an even
> number of srclen)
> So we can eliminate the block if (s == srcend) {} inside the loop!
>
I'm afraid that is not possible.
I tested some variations for this test and regress fails at strings tests.
Anyway for test purposes, I changed it temporarily in this version, but I'm
afraid we'll have to go back.


> 2. I noticed that the error messages for hex_decode should be changed as
>
> s/encoding/decoding/
>
> (as in pg_log_fatal("overflow of destination buffer in hex encoding");)
>
Ok. Changed.


> >If possible can you review the tests if there is an overflow at
> >pg_hex_encode and pg_hex_decode functions?
>
> I would prefer to wait for another patch version from you (my development
> troubles), perhaps also dealing with base64.c
>
base64.c can be made in another patch.


> I don't know how and where any call to the encoding/decoding functions
> creates an overflow situation in normal operation.
>

> I  will nonceless try the tests but I don't have any experience with it.
>
Thanks.


> BTW the root cause for my work is an attempt to vastly accelerate these
> functions (hex and base64 encode/decode), but this is left for another day
> (not finished yet) and certainly only on master/new development.
>
I think this can speed up a little.


> Lateron support on this topic would be highly appreciated...
>
If I can help, count on me.

regards,
Ranier Vilela


0001-improve-hex-functions-handling-v1.patch
Description: Binary data

Re: badly calculated width of emoji in psql

2021-08-16 Thread Jacob Champion

On Mon, 2021-08-16 at 11:24 -0400, John Naylor wrote:
> 
> On Sun, Aug 15, 2021 at 10:45 PM Michael Paquier  wrote:
> 
> > How large do libpgcommon deliverables get with this patch?  Skimming
> > through the patch, that just looks like a couple of bytes, still.
> 
> More like a couple thousand bytes. That's because the width
> of mbinterval doubled. If this is not desirable, we could teach the
> scripts to adjust the width of the interval type depending on the
> largest character they saw.

True. Note that the combining character table currently excludes
codepoints outside of the BMP, so if someone wants combinations in
higher planes to be handled correctly in the future, the mbinterval for
that table may have to be widened anyway.

--Jacob

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Álvaro Herrera

On 2021-Aug-16, Tom Lane wrote:

> =?utf-8?Q?=C3=81lvaro?= Herrera  writes:
> > Here's the reversal patch for the 14 branch.  (It applies cleanly to
> > master, but the unused member of PgStat_StatTabEntry needs to be
> > removed and catversion bumped).
> 
> I don't follow the connection to catversion?

Sorry, I misspoke -- I mean PGSTAT_FORMAT_FILE_ID.  I shouldn't just
change it, since if I do then the file is reported as corrupted and all
counters are lost.  So in the posted patch I did as you suggest:

> I agree that we probably don't want to change PgStat_StatTabEntry in
> v14 at this point.  But it'd be a good idea to attach a comment to
> the entry saying it's unused but left there for ABI reasons.

It's only in branch master that I'd change the pgstat format version and
remove the field.  This is what I meant with the patch being for v14 and
a tweak needed for this in master.

A catversion bump would be required to change the definition of
pg_stat_user_tables, which the patch being reverted originally changed
to include relkind 'p'.  A straight revert would remove that, but in my
reversal patch I chose to keep it in place.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

AW: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

2021-08-16 Thread Hans Buschmann



Von: Ranier Vilela 
Gesendet: Montag, 16. August 2021 17:04
An: Hans Buschmann
Cc: pgsql-hack...@postgresql.org
Betreff: Re: PG14: Avoid checking output-buffer-length for every encoded byte 
during pg_hex_encode

Hello Ranier,

Thank you for your quick response.

>Is always good to remove immutables from loops, if safe and secure.

I think it's  worse because a loop changed variable is involved in the test, so 
it seems to be not immutable.

>Decode has regression too.

good catch, I overlooked it.

>I think that we can take one more step here.
>pg_hex_decode can be improved too.

+1

By looking a bit more precisely I detected the same checks in common/base64.c 
also involving loop-changed variables. Could you please make the same changes 
to base64.c?


>We can remove limitations from all functions that use hex functions.
>byteaout from (varlena.c) has an artificial limitation to handle only 
>MaxAllocSize length, but
>nothing prevents her from being prepared for that limitation to be removed one 
>day.

+1, but I don't know all implications of the type change to size_t

>Please, take a look at the new version attached.

Two remarks for decoding (by eyeball inspection of patch file only because of 
still struggling with git etc.):

1. The odd-number-of-digits-check for decoding is still part of the loop. It 
should be before the loop and called only once (by testing for an even number 
of srclen)
So we can eliminate the block if (s == srcend)? {} inside the loop!

2. I noticed that the error messages for hex_decode should be changed as

s/encoding/decoding/

(as in pg_log_fatal("overflow of destination buffer in hex encoding");)

>If possible can you review the tests if there is an overflow at
>pg_hex_encode and pg_hex_decode functions?

I would prefer to wait for another patch version from you (my development 
troubles), perhaps also dealing with base64.c

I don't know how and where any call to the encoding/decoding functions creates 
an overflow situation in normal operation.

I  will nonceless try the tests but I don't have any experience with it.

BTW the root cause for my work is an attempt to vastly accelerate these 
functions (hex and base64 encode/decode), but this is left for another day (not 
finished yet) and certainly only on master/new development.

Lateron support on this topic would be highly appreciated...

Best regards,
Hans Buschmann

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Tom Lane

=?utf-8?Q?=C3=81lvaro?= Herrera  writes:
> Here's the reversal patch for the 14 branch.  (It applies cleanly to
> master, but the unused member of PgStat_StatTabEntry needs to be
> removed and catversion bumped).

I don't follow the connection to catversion?

I agree that we probably don't want to change PgStat_StatTabEntry in
v14 at this point.  But it'd be a good idea to attach a comment to
the entry saying it's unused but left there for ABI reasons.

regards, tom lane

Re: Slightly improve initdb --sync-only option's help message

2021-08-16 Thread Gurjeet Singh

On Mon, Aug 16, 2021 at 4:42 AM Daniel Gustafsson  wrote:
>
> > On 30 Jul 2021, at 18:27, Bossart, Nathan  wrote:
> >
> > On 7/30/21, 2:22 AM, "Daniel Gustafsson"  wrote:
> >> LGTM.  I took the liberty to rephrase the "and exit" part of the initdb 
> >> help
> >> output match the other exiting options in there.  Barring objections, I 
> >> think
> >> this is ready.
> >
> > LGTM.  Thanks!
>
> Pushed to master, thanks!

Thank you Daniel and Nathan! Much appreciated.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/

Re: call popcount32/64 directly on non-x86 platforms

2021-08-16 Thread John Naylor

I wrote:
> On Thu, Aug 12, 2021 at 1:26 AM David Rowley  wrote:
> > Closer, but I don't see why there's any need to make the fast and slow
> > functions external.  It should be perfectly fine to keep them static.
> >
> > I didn't test the performance, but the attached works for me.
>
> Thanks for that! I still get a big improvement to on Power8 / gcc 4.8,
but it's not quite as fast as earlier versions, which were around 200ms:
>
> master: 646ms
> v3: 312ms
>
> This machine does seem to be pickier about code layout than any other
I've tried running benchmarks on, but that's still a bit much. In any case,
your version is clearer and has the intended effect, so I plan to commit
that, barring other comments.

Pushed, thanks for looking1

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Window Function "Run Conditions"

2021-08-16 Thread Zhihong Yu

On Mon, Aug 16, 2021 at 3:28 AM David Rowley  wrote:

> On Thu, 1 Jul 2021 at 21:11, David Rowley  wrote:
> > 1) Unsure of the API to the prosupport function.  I wonder if the
> > prosupport function should just be able to say if the function is
> > either monotonically increasing or decreasing or neither then have
> > core code build a qual.  That would make the job of building new
> > functions easier, but massively reduce the flexibility of the feature.
> > I'm just not sure it needs to do more in the future.
>
> I looked at this patch again today and ended up changing the API that
> I'd done for the prosupport functions.  These just now set a new
> "monotonic" field in the (also newly renamed)
> SupportRequestWFuncMonotonic struct. This can be set to one of the
> values from the newly added MonotonicFunction enum, namely:
> MONOTONICFUNC_NONE, MONOTONICFUNC_INCREASING, MONOTONICFUNC_DECREASING
> or MONOTONICFUNC_BOTH.
>
> I also added handling for a few more cases that are perhaps rare but
> could be done with just a few lines of code. For example; COUNT(*)
> OVER() is MONOTONICFUNC_BOTH as it can neither increase nor decrease
> for a given window partition. I think technically all of the standard
> set of aggregate functions could have a prosupport function to handle
> that case. Min() and Max() could go a little further, but I'm not sure
> if adding handling for that would be worth it, and if someone does
> think that it is worth it, then I'd rather do that as a separate
> patch.
>
> I put the MonotonicFunction enum in plannodes.h. There's nothing
> specific about window functions or support functions. It could, for
> example, be reused again for something else such as monotonic
> set-returning functions.
>
> One thing which I'm still not sure about is where
> find_window_run_conditions() should be located. Currently, it's in
> allpaths.c but that does not really feel like the right place to me.
> We do have planagg.c in src/backend/optimizer/plan, maybe we need
> planwindow.c?
>
> David
>
Hi,

+   if ((res->monotonic & MONOTONICFUNC_INCREASING) ==
MONOTONICFUNC_INCREASING)

The above can be simplified as 'if (res->monotonic &
MONOTONICFUNC_INCREASING) '

Cheers

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Álvaro Herrera

Here's the reversal patch for the 14 branch.  (It applies cleanly to
master, but the unused member of PgStat_StatTabEntry needs to be
removed and catversion bumped).

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)
>From cad5b710a531ec6eefc8856177c68d594c60ac8c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 16 Aug 2021 10:56:07 -0400
Subject: [PATCH] Revert analyze support for partitioned tables

This reverts the following commits:
1b5617eb844cd2470a334c1d2eec66cf9b39c41a Describe (auto-)analyze behavior for partitioned tables
0e69f705cc1a3df273b38c9883fb5765991e04fe Set pg_class.reltuples for partitioned tables
41badeaba8beee7648ebe7923a41c04f1f3cb302 Document ANALYZE storage parameters for partitioned tables
0827e8af70f4653ba17ed773f123a60eadd9f9c9 autovacuum: handle analyze for partitioned tables

There are efficiency issues in this code when handling databases with
large numbers of partitions, and it doesn't look like there isn't any
trivial way to handle those.  There are some other issues as well.  It's
now too late in the cycle for nontrivial fixes, so we'll have to let
Postgres 14 users continue to manually deal with ANALYZE their
partitioned tables, and hopefully we can fix the issues for Postgres 15.

I chose to keep [most of] be280cdad298 ("Don't reset relhasindex for
partitioned tables on ANALYZE") because while we added due to
0827e8af70f4, it is a reasonable change in its own right (since it
affects manual analyze as well as autovacuum-induced analyze) and
there's no reason to revert it.

I retained relkind 'p' in the definition of view pg_stat_user_tables,
because that change would require a catversion bump.
Also, in pg14 only, I keep a struct member that was added in
PgStat_TabStatEntry to avoid breaking compatibility with existing stat
files, because changing that would require a catversion bump.

Backpatch to 14.

Discussion: https://postgr.es/m/20210722205458.f2bug3z6qzxzp...@alap3.anarazel.de
---
 doc/src/sgml/maintenance.sgml  |   6 --
 doc/src/sgml/perform.sgml  |   3 +-
 doc/src/sgml/ref/analyze.sgml  |  40 +++--
 doc/src/sgml/ref/create_table.sgml |   8 +-
 doc/src/sgml/ref/pg_restore.sgml   |   6 +-
 src/backend/access/common/reloptions.c |  15 ++--
 src/backend/commands/analyze.c |  52 +++-
 src/backend/commands/tablecmds.c   |  47 +--
 src/backend/postmaster/autovacuum.c|  66 +++
 src/backend/postmaster/pgstat.c| 108 +++--
 src/include/pgstat.h   |  26 +-
 11 files changed, 57 insertions(+), 320 deletions(-)

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 998a48fc25..36f975b1e5 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -817,12 +817,6 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 
 is compared to the total number of tuples inserted, updated, or deleted
 since the last ANALYZE.
-For partitioned tables, inserts, updates and deletes on partitions
-are counted towards this threshold; however, DDL
-operations such as ATTACH, DETACH
-and DROP are not, so running a manual
-ANALYZE is recommended if the partition added or
-removed contains a statistically significant volume of data.

 

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index ddd6c3ff3e..89ff58338e 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1767,8 +1767,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

 Whenever you have significantly altered the distribution of data
 within a table, running ANALYZE is strongly recommended. This
-includes bulk loading large amounts of data into the table as well as
-attaching, detaching or dropping partitions.  Running
+includes bulk loading large amounts of data into the table.  Running
 ANALYZE (or VACUUM ANALYZE)
 ensures that the planner has up-to-date statistics about the
 table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index 176c7cb225..c8fcebc161 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,38 +250,20 @@ ANALYZE [ VERBOSE ] [ table_and_columns
 
   
-   If the table being analyzed is partitioned, ANALYZE
-   will gather statistics by sampling blocks randomly from its partitions;
-   in addition, it will recurse into each partition and update its statistics.
-   (However, in multi-level partitioning scenarios, each leaf partition
-   will only be analyzed once.)
-   By contrast, if the table being analyzed has inheritance children,
-   ANALYZE will gather statistics for it twice:
-   once on the rows of the parent table only, and a s

Re: PoC/WIP: Extended statistics on expressions

2021-08-16 Thread Tomas Vondra




On 8/16/21 3:32 AM, Justin Pryzby wrote:

On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote:

Looking at the current behaviour, there are a couple of things that
seem a little odd, even though they are understandable. For example,
the fact that

   CREATE STATISTICS s (expressions) ON (expr), col FROM tbl;

fails, but

   CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl;

succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax

   CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl;

tends to suggest that it's going to create statistics on the pair of
expressions, describing their correlation, when actually it builds 2
independent statistics. Also, this error text isn't entirely accurate:

   CREATE STATISTICS s ON col FROM tbl;
   ERROR:  extended statistics require at least 2 columns

because extended statistics don't always require 2 columns, they can
also just have an expression, or multiple expressions and 0 or 1
columns.

I think a lot of this stems from treating "expressions" in the same
way as the other (multi-column) stats kinds, and it might actually be
neater to have separate documented syntaxes for single- and
multi-column statistics:

   CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
 ON (expression)
 FROM table_name

   CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
 [ ( statistics_kind [, ... ] ) ]
 ON { column_name | (expression) } , { column_name | (expression) } [, ...]
 FROM table_name

The first syntax would create single-column stats, and wouldn't accept
a statistics_kind argument, because there is only one kind of
single-column statistic. Maybe that might change in the future, but if
so, it's likely that the kinds of single-column stats will be
different from the kinds of multi-column stats.

In the second syntax, the only accepted kinds would be the current
multi-column stats kinds (ndistinct, dependencies, and mcv), and it
would always build stats describing the correlations between the
columns listed. It would continue to build standard/expression stats
on any expressions in the list, but that's more of an implementation
detail.

It would no longer be possible to do "CREATE STATISTICS s
(expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to
issue 2 separate "CREATE STATISTICS" commands, but that seems more
logical, because they're independent stats.

The parsing code might not change much, but some of the errors would
be different. For example, the errors "building only extended
expression statistics on simple columns not allowed" and "extended
expression statistics require at least one expression" would go away,
and the error "extended statistics require at least 2 columns" might
become more specific, depending on the stats kind.


This still seems odd:

postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR:  extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS

It seems wrong that the command works with added parens, but builds expression
stats on a simple column (which is redundant with what analyze does without
extended stats).



Well, yeah. But I think this is a behavior that was discussed somewhere 
in this thread, and the agreement was that it's not worth the 
complexity, as this comment explains


  * XXX We do only the bare minimum to separate simple attribute and
  * complex expressions - for example "(a)" will be treated as a complex
  * expression. No matter how elaborate the check is, there'll always be
  * a way around it, if the user is determined (consider e.g. "(a+0)"),
  * so it's not worth protecting against it.

Of course, maybe that wasn't the right decision - it's a bit weird that

  CREATE INDEX on t ((a), (b))

actually "extracts" the column references and stores that in indkeys, 
instead of treating that as expressions.


Patch 0001 fixes the "double parens" issue discussed elsewhere in this 
thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a 
simple column reference.


But I'm not sure 0002 is something we can do without catversion bump. 
What if someone created such "bogus" statistics? It's mostly harmless, 
because the statistics is useless anyway (AFAICS we'll just use the 
regular one we have for the column), but if they do pg_dump, that'll 
fail because of this new restriction.


OTOH we're still "only" in beta, and IIRC the rule is not to bump 
catversion after rc1.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 4428ee5b46fe1ce45331c355e1646520ca769991 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 16 Aug 2021 16:40:43 +0200
Subject: [PATCH 1/2] fix: don't print double parens

---
 src/backend/utils/adt/ruleutils.c |   2 +-
 .../regress/expected/create_table_like.out|   6 +-
 src/test/regress/expected/stats_ext.out   | 110 +-
 3 files changed, 59 insertions(+), 59 del

Re: Non-decimal integer literals

2021-08-16 Thread John Naylor

On Mon, Aug 16, 2021 at 5:52 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> Here is a patch to add support for hexadecimal, octal, and binary
> integer literals:
>
>  0x42E
>  0o112
>  0b100101
>
> per SQL:202x draft.
>
> This adds support in the lexer as well as in the integer type input
> functions.

The one thing that jumped out at me on a cursory reading is the {integer}
rule, which seems to be used nowhere except to
call process_integer_literal, which must then inspect the token text to
figure out what type of integer it is. Maybe consider 4 separate
process_*_literal functions?

--
John Naylor
EDB: http://www.enterprisedb.com

Re: badly calculated width of emoji in psql

2021-08-16 Thread John Naylor

On Sun, Aug 15, 2021 at 10:45 PM Michael Paquier 
wrote:
>
> On Thu, Aug 12, 2021 at 05:13:31PM -0400, John Naylor wrote:
> > I'm a bit concerned about the build dependencies not working right, but
> > it's not clear it's even due to the patch. I'll spend some time
> > investigating next week.
>
> How large do libpgcommon deliverables get with this patch?  Skimming
> through the patch, that just looks like a couple of bytes, still.

More like a couple thousand bytes. That's because the width of mbinterval
doubled. If this is not desirable, we could teach the scripts to adjust the
width of the interval type depending on the largest character they saw.

--
John Naylor
EDB: http://www.enterprisedb.com

Re[5]: On login trigger: take three

2021-08-16 Thread Ivan Panchenko


 
Hi Greg,
  
>Среда, 7 июля 2021, 3:55 +03:00 от Greg Nancarrow :
> 
>On Sun, Jul 4, 2021 at 1:21 PM vignesh C < vignes...@gmail.com > wrote:
>>
>> CFBot shows the following failure:
>> # poll_query_until timed out executing this query:
>> # SELECT '0/3046250' <= replay_lsn AND state = 'streaming' FROM
>> pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';
>> # expecting this output:
>> # t
>> # last actual query output:
>> # t
>> # with stderr:
>> # NOTICE: You are welcome!
>> # Looks like your test exited with 29 before it could output anything.
>> t/001_stream_rep.pl ..
>> Dubious, test returned 29 (wstat 7424, 0x1d00)
>>
>Thanks.
>I found that the patch was broken by commit f452aaf7d (the part
>"adjust poll_query_until to insist on empty stderr as well as a stdout
>match").
>So I had to remove a "RAISE NOTICE" (which was just an informational
>message) from the login trigger function, to satisfy the new
>poll_query_until expectations.

Does it mean that "RAISE NOTICE" should’t be used, or behaves unexpectedly in 
logon triggers? Should we mention this in the docs?
 
Regards,
Ivan Panchenko
>Also, I updated a PG14 version check (now must check PG15 version).
>
>Regards,
>Greg Nancarrow
>Fujitsu Australia

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

2021-08-16 Thread Ranier Vilela

 Welcome.
Em seg., 16 de ago. de 2021 às 05:46, Hans Buschmann 
escreveu:

> During some development on encoding-related parts of postgres I stumbled
> over the new length-checking-code in common/hex.c/pg_hex_encode.
>
> Differently when comparing it to all versions before the
> output-buffer-length is checked during encoding of every individual source
> byte.
>
Is always good to remove immutables from loops, if safe and secure.


> This may impose quite a regression when using pg_dump on databases with
> many/big bytea or lo columns.
>
Decode has regression too.


> Because all criteria to check buffer-length are known in advance of
> encoding (srclen and destlen) I propose doing the check only once before
> starting the while-loop.
>
Good.


>
> Please find the attached patch for pg14 and master, older versions did not
> have this behavior.
>
I think that we can take one more step here.
pg_hex_decode can be improved too.
We can remove limitations from all functions that use hex functions.
byteaout from (varlena.c) has an artificial limitation to handle only
MaxAllocSize length, but
nothing prevents her from being prepared for that limitation to be removed
one day.


> Tested on pg14-beta3, but applies also on master.
>
> PS: This is my very first patch, I am in no way an experienced C-developer
> and don't have a smoothly running development environment or experience
> yet. (originally mostly dealing with postgres on Windows).
>
It seems good to me.

Please, take a look at the new version attached.
If possible can you review the tests if there is an overflow at
pg_hex_encode and pg_hex_decode functions?

regards,
Ranier Vilela


0001-improve-hex-functions-handling.patch
Description: Binary data

Re: POC: Cleaning up orphaned files using undo logs

2021-08-16 Thread Dmitry Dolgov

> On Wed, Jun 30, 2021 at 07:41:16PM +0200, Antonin Houska wrote:
> Antonin Houska  wrote:
>
> > tsunakawa.ta...@fujitsu.com  wrote:
> >
> > > I'm crawling like a snail to read the patch set.  Below are my first set 
> > > of review comments, which are all minor.
> >
> > Thanks.
>
> I've added the patch to the upcoming CF [1], so it possibly gets more review
> and makes some progress. I've marked myself as the author so it's clear who
> will try to respond to the reviews. It's clear that other people did much more
> work on the feature than I did so far - they are welcome to add themselves to
> the author list.
>
> [1] https://commitfest.postgresql.org/33/3228/

Hi,

I'm crawling through the patch set like even slower creature than a snail,
sorry for long absence. I'm reading the latest version posted here and,
although it's hard to give any high level design comments on it yet, I thought
it could be useful to post a few findings and questions in the meantime.

* One question about the functionality:

  > On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
  > Attached is the next version. Changes done:
  >
  >   * Removed the progress tracking and implemented undo discarding in a 
simpler
  > way. Now, instead of maintaining the pointer to the last record applied,
  > only a boolean field in the chunk header is set when ROLLBACK is
  > done. This helps to determine whether the undo of a non-committed
  > transaction can be discarded.

  Just to clarify, the whole feature was removed for the sake of
  simplicity, right?

* By throwing at the patchset `make installcheck` I'm getting from time to time
  and error on the restart:

TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
File: "undorecordset.c", Line: 1098, PID: 6055)

  From what I see XLogReadBufferForRedoExtended finds an invalid buffer and
  returns BLK_NOTFOUND. The commentary says:

 If the block was not found, then it must be discarded later in
 the WAL.

  and continues with skip = false, but fails to get a page from an invalid
  buffer few lines later. It seems that the skip flag is supposed to be used
  this situation, should it also guard the BufferGetPage part?

* Another interesting issue I've found happened inside
  DropUndoLogsInTablespace, when the process got SIGTERM. It seems processing
  stuck on:

slist_foreach_modify(iter, &UndoLogShared->shared_free_lists[i])

  iterating on the same element over and over. My guess is
  clear_private_free_lists was called and caused such unexpected outcome,
  should the access to shared_free_lists be somehow protected?

* I also wonder about the segments in base/undo, the commentary in pg_undodump
  says:

 Since the UNDO log is a continuous stream of changes, any hole
 terminates processing.

  It looks like it's relatively easy to end up with such holes, and pg_undodump
  ends up with a message (found is added by me and contains a found offset
  which do not match the expected value):

pg_undodump: error: segment 00 missing in log 2, found 10

  This seems to be not causing any real issues, but it's not clear for me if
  such situation with gaps is fine or is it a problem?

Other than that one more time thank you for this tremendous work, I find that
the topic is of extreme importance.

Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o

2021-08-16 Thread Dilip Kumar

On Fri, Aug 13, 2021 at 9:29 PM Andres Freund  wrote:

> > I think we can extend this API but I guess it is better to then do it
> > for dsm-based as well so that these get tracked via resowner.
>
> DSM segments are resowner managed already, so it's not obvious that that'd buy
> us much? Although I guess there could be a few edge cases that'd look cleaner,
> because we could reliably trigger cleanup in the leader instead of relying on
> dsm detach hooks + refcounts to manage when a set is physically deleted?
>

In an off-list discussion with Thomas and Amit, we tried to discuss
how to clean up the shared files set in the current use case.  Thomas
suggested that instead of using individual shared fileset for storing
the data for each XID why don't we just create a single shared fileset
for complete worker lifetime and when the worker is exiting we can
just remove that shared fileset.  And for storing XID data, we can
just create the files under the same shared fileset and delete those
files when we longer need them.  I like this idea and it looks much
cleaner, after this, we can get rid of the special cleanup mechanism
using 'filesetlist'.  I have attached a patch for the same.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

0001-Better-usage-of-sharedfileset-in-apply-worker.patch
Description: Binary data

Re: Some RELKIND macro refactoring

2021-08-16 Thread Alvaro Herrera

On 2021-Aug-16, Peter Eisentraut wrote:

> + if (rel->rd_rel->relkind == RELKIND_INDEX ||
> + rel->rd_rel->relkind == RELKIND_SEQUENCE)
> + RelationCreateStorage(rel->rd_node, relpersistence);
> + else if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind))
> + table_relation_set_new_filenode(rel, &rel->rd_node,
> + 
> relpersistence,
> + 
> relfrozenxid, relminmxid);
> + else
> + Assert(false);

I think you could turn this one (and the one in RelationSetNewRelfilenode) 
around:

if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind))
table_relation_set_new_filenode(...);
else if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
RelationCreateStorage(...);

> +/*
> + * Relation kinds with a table access method (rd_tableam).  Although 
> sequences
> + * use the heap table AM, they are enough of a special case in most uses that
> + * they are not included here.
> + */
> +#define RELKIND_HAS_TABLE_AM(relkind) \
> + ((relkind) == RELKIND_RELATION || \
> +  (relkind) == RELKIND_TOASTVALUE || \
> +  (relkind) == RELKIND_MATVIEW)

Partitioned tables are not listed here, but IIRC there's a patch to let
partitioned tables have a table AM so that their partitions inherit it.
I'm wondering if that patch is going to have to change this spot; and if
it does, how will that affect the callers of this macro that this patch
creates.  I think a few places assume that HAS_TABLE_AM means that the
table has storage, but perhaps it would be better to spell that out
explicitly:

@@ -303,9 +303,7 @@ verify_heapam(PG_FUNCTION_ARGS)
/*
 * Check that a relation's relkind and access method are both supported.
 */
-   if (ctx.rel->rd_rel->relkind != RELKIND_RELATION &&
-   ctx.rel->rd_rel->relkind != RELKIND_MATVIEW &&
-   ctx.rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+   if (!(RELKIND_HAS_TABLE_AM(ctx.rel->rd_rel->relkind) && 
RELKIND_HAS_STOAGE(ctx.rel->rd_rel->relkind)))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("cannot check relation \"%s\"",

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Autovacuum on partitioned table (autoanalyze)

2021-08-16 Thread Álvaro Herrera

On 2021-Aug-13, Álvaro Herrera wrote:

> Some doc changes are pending, and some more commentary in parts of the
> code, but I think this is much more sensible.  I do lament the lack of
> a syscache for pg_inherits.

Thinking about this again, this one here is the killer problem, I think;
this behaves pretty horribly if you have more than one partition level,
because it'll have to do one indexscan *per level per partition*.  (For
example, five partitions two levels down mean ten index scans).  There's
no cache for this, and no way to disable it.  So for situations with a
lot of partitions, it could be troublesome.  Granted, it only needs to
be done for partitions with DML changes since the previous autovacuum
worker run in the affected database, but still it could be significant.

Now we could perhaps have a hash table in partition_analyze_report_ancestors()
to avoid the need for repeated indexscans for partitions of the same
hierarchy (an open-coded cache to take the place of the missing
pg_inherits syscache); and perhaps even use a single seqscan of
pg_inherits to capture the whole story first and then filter down to the
partitions that we were asked to process ... (so are we building a
mini-optimizer to determine which strategy to use in each case?).

That all sounds too much to be doing in the beta.

So I'm leaning towards the idea that we need to revert the patch and
start over for pg15.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)

Re: CI/windows docker vs "am a service" autodetection on windows

2021-08-16 Thread Magnus Hagander

On Fri, Aug 13, 2021 at 3:03 PM Andres Freund  wrote:
>
> Hi,
>
> Magnus, Michael, Anyone - I'd appreciate a look.
>
> On 2021-03-05 12:55:37 -0800, Andres Freund wrote:
> > > After fighting with a windows VM for a bit (ugh), it turns out that yes,
> > > there is stderr, but that fileno(stderr) returns -2, and
> > > GetStdHandle(STD_ERROR_HANDLE) returns NULL (not INVALID_HANDLE_VALUE).
> > >
> > > The complexity however is that while that's true for pg_ctl within
> > > pgwin32_ServiceMain:
> > > checking what stderr=7FF8687DFCB0 is (handle: 0, fileno=-2)
> > > but not for postmaster or backends
> > > WARNING: 01000: checking what stderr=7FF880F5FCB0 is (handle: 92, 
> > > fileno=2)
> > >
> > > which makes sense in a way, because we don't tell CreateProcessAsUser()
> > > that it should pass stdin/out/err down (which then seems to magically
> > > get access to the "topmost" console applications output - damn, this
> > > stuff is weird).
> >
> > That part is not too hard to address - it seems we only need to do that
> > in pg_ctl pgwin32_doRunAsService(). It seems that the
> > stdin/stderr/stdout being set to invalid will then be passed down to
> > postmaster children.
> >
> > https://docs.microsoft.com/en-us/windows/console/getstdhandle
> > "If an application does not have associated standard handles, such as a
> > service running on an interactive desktop, and has not redirected them,
> > the return value is NULL."
> >
> > There does seem to be some difference between what services get as std*
> > - GetStdHandle() returns NULL, and what explicitly passing down invalid
> > handles to postmaster does - GetStdHandle() returns
> > INVALID_HANDLE_VALUE. But passing down NULL rather than
> > INVALID_HANDLE_VALUE to postmaster seems to lead to postmaster
> > re-opening console buffers.
> >
> > Patch attached.
>
> > I'd like to commit something to address this issue to master soon - it
> > allows us to run a lot more tests in cirrus-ci... But probably not
> > backpatch it [yet] - there've not really been field complaints, and who
> > knows if there end up being some unintentional side-effects...
>
> Because it'd allow us to run more tests as part of cfbot and other CI
> efforts, I'd like to push this forward. So I'm planning to commit this
> to master soon-ish, unless somebody wants to take this over? I'm really
> not a windows person...

It certainly sounds reasonable. It does make me wonder why we didn't
use that GetStdHandle in the first place -- mostly in that "did we try
that and it didn't work", but that was long enough ago that I really
can't remember, and I am unable to find any references in my mail
history either. So it may very well just be that we missed it. But
give the number of times we've had issues around this, it makes me
wonder. It could of course also be something that didn't use to be
reliable but us now -- the world of Windows has changed a lot since
that was written.

It wouldn't surprise me if it does break some *other* weird
cornercase, but based on the docs page you linked to it doesn't look
like it would break any of the normal/standard usecases. But I'm also
very much not a Windows person these days, and most of my knowledge on
the API side is quite outdated by now -- so I can only base that on
reading the same manual page you did...

Gaining better testability definitely seems worth it, so I think an
approach of "push to master and see what explodes" is reasonable :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Added schema level support for publication.

2021-08-16 Thread Tom Lane

Peter Smith  writes:
> Then the question from Peter E. [2] "Why can't I have a publication
> that publishes tables t1, t2, t3, *and* schemas s1, s2, s3." would
> have an intuitive solution like:

> CREATE PUBLICATION pub1
> FOR TABLE t1,t2,t3 AND
> FOR ALL TABLES IN SCHEMA s1,s2,s3;

That seems a bit awkward, since the existing precedent is
to use commas.  We shouldn't need more than one FOR noise-word,
either.  So I was imagining syntax more like, say,

CREATE PUBLICATION pub1 FOR
  TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2,
  SEQUENCE seq1,seq2, ALL SEQUENCES IN SCHEMA s3,s4;

Abstractly it'd be

createpub := CREATE PUBLICATION pubname FOR cpitem [, ... ] [ WITH ... ]

cpitem := ALL TABLES |
  TABLE name |
  ALL TABLES IN SCHEMA name |
  ALL SEQUENCES |
  SEQUENCE name |
  ALL SEQUENCES IN SCHEMA name |
  name

The grammar output would need some post-analysis to attribute the
right type to bare "name" items, but that doesn't seem difficult.

regards, tom lane

Some RELKIND macro refactoring

2021-08-16 Thread Peter Eisentraut


Attached patches introduce more macros to group some RELKIND_* macros:

- RELKIND_HAS_PARTITIONS()
- RELKIND_HAS_TABLESPACE()
- RELKIND_HAS_TABLE_AM()
- RELKIND_HAS_INDEX_AM()

I collected those mainly while working on the relkind error messages 
patch [0].  I think they improve the self-documentation of the code in 
many places that are currently just random collections or endless 
streams of RELKIND macros.


Some may recall the previous thread [1] that made a similar proposal. 
The result here was that those macros were too thinly sliced and not 
generally useful enough.  My proposal is completely separate from that.



[0]: 
https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f...@2ndquadrant.com
[1]: 
https://www.postgresql.org/message-id/flat/CAFjFpRcfzs%2Byst6YBCseD_orEcDNuAr9GUTraZ5GC%3DAvCYh55Q%40mail.gmail.com
From 356e2e5c56b9856693a5df24038abd0361d97d92 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 16 Aug 2021 14:25:59 +0200
Subject: [PATCH v1 1/2] pg_dump: Remove redundant relkind checks

It is checked in flagInhTables() which relkinds can have parents.
After that, those entries will have numParents==0, so we don't need to
check the relkind again.
---
 src/bin/pg_dump/common.c  | 8 +---
 src/bin/pg_dump/pg_dump.c | 7 +--
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1f24e79665..7b85718075 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -501,13 +501,7 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int 
numTables)
int numParents;
TableInfo **parents;
 
-   /* Some kinds never have parents */
-   if (tbinfo->relkind == RELKIND_SEQUENCE ||
-   tbinfo->relkind == RELKIND_VIEW ||
-   tbinfo->relkind == RELKIND_MATVIEW)
-   continue;
-
-   /* Don't bother computing anything for non-target tables, 
either */
+   /* Don't bother computing anything for non-target tables */
if (!tbinfo->dobj.dump)
continue;
 
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 90ac445bcd..d114377bde 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2725,12 +2725,7 @@ guessConstraintInheritance(TableInfo *tblinfo, int 
numTables)
TableInfo **parents;
TableInfo  *parent;
 
-   /* Sequences and views never have parents */
-   if (tbinfo->relkind == RELKIND_SEQUENCE ||
-   tbinfo->relkind == RELKIND_VIEW)
-   continue;
-
-   /* Don't bother computing anything for non-target tables, 
either */
+   /* Don't bother computing anything for non-target tables */
if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
continue;
 
-- 
2.32.0

From 0656f3959518bfa1bd03e8bea3028ccf69b1edad Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 16 Aug 2021 14:30:26 +0200
Subject: [PATCH v1 2/2] Some RELKIND macro refactoring

Add more macros to group some RELKIND_* macros:

- RELKIND_HAS_PARTITIONS()
- RELKIND_HAS_TABLESPACE()
- RELKIND_HAS_TABLE_AM()
- RELKIND_HAS_INDEX_AM()
---
 contrib/amcheck/verify_heapam.c|   4 +-
 contrib/pg_surgery/heap_surgery.c  |   4 +-
 contrib/pg_visibility/pg_visibility.c  |   4 +-
 src/backend/access/index/indexam.c |   3 +-
 src/backend/catalog/heap.c | 146 +
 src/backend/catalog/index.c|   2 +-
 src/backend/commands/indexcmds.c   |   9 +-
 src/backend/commands/tablecmds.c   |   8 +-
 src/backend/storage/buffer/bufmgr.c|  42 +++
 src/backend/utils/adt/amutils.c|   3 +-
 src/backend/utils/adt/partitionfuncs.c |   7 +-
 src/backend/utils/cache/relcache.c |  89 +--
 src/bin/pg_dump/pg_dump.c  |  17 ++-
 src/include/catalog/pg_class.h |  33 ++
 14 files changed, 153 insertions(+), 218 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 226271923a..1e9624b84d 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -303,9 +303,7 @@ verify_heapam(PG_FUNCTION_ARGS)
/*
 * Check that a relation's relkind and access method are both supported.
 */
-   if (ctx.rel->rd_rel->relkind != RELKIND_RELATION &&
-   ctx.rel->rd_rel->relkind != RELKIND_MATVIEW &&
-   ctx.rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+   if (!RELKIND_HAS_TABLE_AM(ctx.rel->rd_rel->relkind))
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("cannot check relation \"%s\"",
diff --git a/contrib/pg_surgery/heap_surgery.c 
b/contrib/pg_surgery/heap_surger

Re: PoC/WIP: Extended statistics on expressions

2021-08-16 Thread Tomas Vondra





On 8/16/21 3:31 AM, Justin Pryzby wrote:

On 1/22/21 5:01 AM, Justin Pryzby wrote:

In any case, why are there so many parentheses ?


On Fri, Jan 22, 2021 at 02:09:04PM +0100, Tomas Vondra wrote:

That's a bug in pg_get_statisticsobj_worker, probably. It shouldn't be
adding extra parentheses, on top of what deparse_expression_pretty does.
Will fix.


The extra parens are still here - is it intended ?



Ah, thanks for reminding me! I was looking at this, and the problem is 
that pg_get_statisticsobj_worker only does this:


prettyFlags = PRETTYFLAG_INDENT;

Changing that to

prettyFlags = PRETTYFLAG_INDENT | PRETTYFLAG_PAREN;

fixes this (not sure we need the INDENT flag - probably not).

I'm a bit confused, though. My assumption was "PRETTYFLAG_PAREN = true" 
would force the deparsing itself to add the parens, if needed, but in 
reality it works the other way around.


I guess it's more complicated due to deparsing multi-level expressions, 
but unfortunately, there's no comment explaining what it does.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Slightly improve initdb --sync-only option's help message

2021-08-16 Thread Daniel Gustafsson

> On 30 Jul 2021, at 18:27, Bossart, Nathan  wrote:
> 
> On 7/30/21, 2:22 AM, "Daniel Gustafsson"  wrote:
>> LGTM.  I took the liberty to rephrase the "and exit" part of the initdb help
>> output match the other exiting options in there.  Barring objections, I think
>> this is ready.
> 
> LGTM.  Thanks!

Pushed to master, thanks!

--
Daniel Gustafsson   https://vmware.com/

RE: Why timestamptz_pl_interval and timestamptz_mi_interval are not immutable?

2021-08-16 Thread Floris Van Nee

> 
> What do I miss?
> 
> --
> Best regards,
> Alexander Pyhalov,
> Postgres Professional
> 

See for example around DST changes

postgres=# begin;
BEGIN
postgres =# show timezone;
 TimeZone 
--
 Europe/Amsterdam
(1 row)

postgres=# select '2021-03-27 15:00 +0100'::timestamptz + interval '1d';
?column?

 2021-03-28 15:00:00+02
(1 row)

postgres =# set timezone to UTC;
SET
postgres =# select '2021-03-27 15:00 +0100'::timestamptz + interval '1d';
?column?

 2021-03-28 14:00:00+00
(1 row)

postgres =# select '2021-03-28 15:00:00+02' = '2021-03-28 14:00:00+00';
 ?column? 
--
 f
(1 row)

Why timestamptz_pl_interval and timestamptz_mi_interval are not immutable?

2021-08-16 Thread Alexander Pyhalov


Hi.

I'm currently looking on pushing down SQLValue expressions to foreign 
servers and was surprised that two timestamptz-related functions are not 
immutable.

I see that this was changed in commit

commit 1ab415596d1de61561d0de8fe9da4aea207adca4
Author: Tom Lane 
Date:   Mon Oct 4 22:13:14 2004 +

Correct the volatility labeling of ten timestamp-related functions,
per discussion from Friday.  initdb not forced in this commit but I 
intend

to do that later.

I'm not sure, why  timestamptz_pl_interval and timestamptz_mi_interval 
are not immutable. Even if we change timezone during transaction, 
addition of the same interval to the same timestamps with time zone 
gives the same result.


postgres=# begin ;
BEGIN
postgres=*# select current_timestamp;
   current_timestamp
---
 2021-08-16 13:26:59.366452+03
(1 row)

postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03';
  timestamptz
---
 2021-08-16 13:26:59.366452+03
(1 row)

postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 
days'::interval;

   ?column?
---
 2021-08-18 13:26:59.366452+03
(1 row)

postgres=*# set timezone to UTC;
SET
postgres=*# select timestamptz '2021-08-16 13:26:59.366452+03' + '2 
days'::interval;

   ?column?
---
 2021-08-18 10:26:59.366452+00
(1 row)

postgres=*# select timestamptz '2021-08-18 13:26:59.366452+03' = 
timestamptz '2021-08-18 10:26:59.366452+00';

 ?column?
--
 t
(1 row)

What do I miss?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Window Function "Run Conditions"

2021-08-16 Thread David Rowley

On Thu, 1 Jul 2021 at 21:11, David Rowley  wrote:
> 1) Unsure of the API to the prosupport function.  I wonder if the
> prosupport function should just be able to say if the function is
> either monotonically increasing or decreasing or neither then have
> core code build a qual.  That would make the job of building new
> functions easier, but massively reduce the flexibility of the feature.
> I'm just not sure it needs to do more in the future.

I looked at this patch again today and ended up changing the API that
I'd done for the prosupport functions.  These just now set a new
"monotonic" field in the (also newly renamed)
SupportRequestWFuncMonotonic struct. This can be set to one of the
values from the newly added MonotonicFunction enum, namely:
MONOTONICFUNC_NONE, MONOTONICFUNC_INCREASING, MONOTONICFUNC_DECREASING
or MONOTONICFUNC_BOTH.

I also added handling for a few more cases that are perhaps rare but
could be done with just a few lines of code. For example; COUNT(*)
OVER() is MONOTONICFUNC_BOTH as it can neither increase nor decrease
for a given window partition. I think technically all of the standard
set of aggregate functions could have a prosupport function to handle
that case. Min() and Max() could go a little further, but I'm not sure
if adding handling for that would be worth it, and if someone does
think that it is worth it, then I'd rather do that as a separate
patch.

I put the MonotonicFunction enum in plannodes.h. There's nothing
specific about window functions or support functions. It could, for
example, be reused again for something else such as monotonic
set-returning functions.

One thing which I'm still not sure about is where
find_window_run_conditions() should be located. Currently, it's in
allpaths.c but that does not really feel like the right place to me.
We do have planagg.c in src/backend/optimizer/plan, maybe we need
planwindow.c?

David

v2_teach_planner_and_executor_about_monotonic_wfuncs.patch
Description: Binary data

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-16 Thread Michael Meskes

> > (1) There should be no output to stderr in the tests.  Why isn't
> > this
> > message being caught and redirected to the normal test output file?
> 
> These are generated during the compilation of the tests with the
> pre-processor, so that's outside the test runs.

This is actually a deeper issue, we have no test for the compiler
itself, other than the source code it generates. We do not test
warnings or errors thrown by it. The topic has come up ages ago and we
simply removed the test that generated the (planned) warning message.

> > (2) This message is both unintelligible and grammatically
> > incorrect.
> 
> Yeah, debugging such tests would be more helpful if the name of the
> DECLARE statement is included, at least.  Those messages being
> generated is not normal anyway, which is something coming from the
> tests as a typo with the connection name of stmt_3.
> 
> Michael, what do you think about the attached?

I think what Tom was saying is that it should be either "is overwritten
with" or "is rewritten to", but you raise a very good point. Adding the
statement name makes the message better. I fully agree. However, it
should be the other way round, the DECLARE STATEMENT changes the
connection that is used. 

You patch removes the warning but by doing that also removes the
feature that is being tested.

I'm not sure what's the best way to go about it, Shall we accept to not
test this particular feature and remove the warning? After all this is
not the way the statement should be used, hence the warning. Or should
be keep it in and redirect the warning? In that case, we would also
lose other warnings that are not planned, though.

Any comments?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De
Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org


signature.asc
Description: This is a digitally signed message part

Non-decimal integer literals

2021-08-16 Thread Peter Eisentraut

Here is a patch to add support for hexadecimal, octal, and binary 
integer literals:


0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input 
functions.


Those core parts are straightforward enough, but there are a bunch of 
other places where integers are parsed, and one could consider in each 
case whether they should get the same treatment, for example the 
replication syntax lexer, or input function for oid, numeric, and 
int2vector.  There are also some opportunities to move some code around, 
for example scanint8() could be in numutils.c.  I have also looked with 
some suspicion at some details of the number lexing in ecpg, but haven't 
found anything I could break yet.  Suggestions are welcome.
From f2a9b37968a55bf91feb2b4753745c9f5a64be2e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 16 Aug 2021 09:32:14 +0200
Subject: [PATCH v1] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42E
0o112
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.
---
 doc/src/sgml/syntax.sgml | 26 
 src/backend/catalog/sql_features.txt |  1 +
 src/backend/parser/scan.l| 70 ++--
 src/backend/utils/adt/int8.c | 54 
 src/backend/utils/adt/numutils.c | 97 
 src/fe_utils/psqlscan.l  | 55 +++-
 src/interfaces/ecpg/preproc/pgc.l| 64 +++---
 src/test/regress/expected/int2.out   | 19 ++
 src/test/regress/expected/int4.out   | 37 +++
 src/test/regress/expected/int8.out   | 19 ++
 src/test/regress/sql/int2.sql|  7 ++
 src/test/regress/sql/int4.sql| 11 
 src/test/regress/sql/int8.sql|  7 ++
 13 files changed, 412 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index d66560b587..8fb4b1228d 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,32 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o112
+0O755
+0x42e
+0X
+
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 9f424216e2..d6359503f3 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -526,6 +526,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
+T661   Non-decimal integer literalsYES SQL:202x draft
 T811   Basic SQL/JSON constructor functionsNO  
 T812   SQL/JSON: JSON_OBJECTAGGNO  
 T813   SQL/JSON: JSON_ARRAYAGG with ORDER BY   NO  
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..83458ffb30 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -262,7 +262,7 @@ quotecontinuefail   {whitespace}*"-"?
 xbstart[bB]{quote}
 xbinside   [^']*
 
-/* Hexadecimal number */
+/* Hexadecimal byte string */
 xhstart[xX]{quote}
 xhinside   [^']*
 
@@ -341,7 +341,7 @@ xcstart \/\*{op_chars}*
 xcstop \*+\/
 xcinside   [^*/]+
 
-digit  [0-9]
+
 ident_start[A-Za-z\200-\377_]
 ident_cont [A-Za-z\200-\377_0-9\$]
 
@@ -380,24 +380,41 @@ self  [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars   [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator   {op_chars}+
 
-/* we no longer allow unary minus in numbers.
- * instead we pass it separately to parser. there it gets
- * coerced via doNegate() -- Leon aug 20 1999
+/*
+ * Numbers
+ *
+ * Unary minus is not part of a number here.  Instead we pass it separately to
+ * parser, and there it gets coerced via doNegate().
  *
- * {decimalfail} is used because we would like "1..10" to lex as 1, dot_dot, 
10.
+ * {numericfail} is used because we would like "1..10" to lex as 1, dot_dot, 
10.
  *
  * {realfail1} and {realfail2} are added to prevent the need for

Is it worth pushing conditions to sublink/subplan?

2021-08-16 Thread Wenjing

Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in 
SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, 
However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get 
the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);


postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
QUERY PLAN 
---
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
 Filter: ((b = 1) AND (a = 1))
   ->  Append
 ->  Subquery Scan on "*SELECT* 1"
   ->  Seq Scan on ab_a1_b1 x
 Filter: ((a = 1) AND (b = 1))
 ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
QUERY PLAN 
---
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
 ->  Append
   ->  Seq Scan on ab_a1_b1 x_1
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b2 x_2
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b3 x_3
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b1 x_4
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b2 x_5
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b3 x_6
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b1 x_7
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b2 x_8
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b3 x_9
 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in 
targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
QUERY PLAN 
---
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
 ->  Append
   ->  Seq Scan on ab_a1_b1 x_1
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b2 x_2
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b3 x_3
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b1 x_4
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b2 x_5
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b3 x_6
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b1 x_7
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b2 x_8
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b3 x_9
 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in 
where clause.





smime.p7s
Description: S/MIME cryptographic signature

PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

2021-08-16 Thread Hans Buschmann

During some development on encoding-related parts of postgres I stumbled over 
the new length-checking-code in common/hex.c/pg_hex_encode.

Differently when comparing it to all versions before the output-buffer-length 
is checked during encoding of every individual source byte.

This may impose quite a regression when using pg_dump on databases with 
many/big bytea or lo columns.

Because all criteria to check buffer-length are known in advance of encoding 
(srclen and destlen) I propose doing the check only once before starting the 
while-loop.

Please find the attached patch for pg14 and master, older versions did not have 
this behavior.

Tested on pg14-beta3, but applies also on master.

PS: This is my very first patch, I am in no way an experienced C-developer and 
don't have a smoothly running development environment or experience yet. 
(originally mostly dealing with postgres on Windows).

If it seems useful somebody could enter it as an open item / resolved item for 
pg14 after beta 3.

Thanks for looking!

Hans Buschmann



hex_encode_length_check_outside_loop.patch
Description: hex_encode_length_check_outside_loop.patch

Re: Failure of subscription tests with topminnow

2021-08-16 Thread Amit Kapila

On Mon, Aug 16, 2021 at 9:24 AM Michael Paquier  wrote:
>
> Hi all,
>
> topminnow has just failed in a weird fashion:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-08-15%2013%3A24%3A58
> # SELECT pid !=  FROM pg_stat_replication WHERE application_name = 'tap_sub';
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # ERROR:  syntax error at or near "FROM"
> # LINE 1: SELECT pid !=  FROM pg_stat_replication WHERE application_na...
>
> Looking at the logs, it seems like the problem boils down to an active
> slot when using ALTER SUBSCRIPTION tap_sub CONNECTION:
> 2021-08-15 18:44:38.027 CEST [16473:2] ERROR:  could not start WAL
> streaming: ERROR:  replication slot "tap_sub" is active for PID 16336
>

The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the
walsender process. Now, here the problem seems to be that the previous
walsender process (16336) didn't exit and the new process started to
use the slot which leads to the error shown in the log. This is
evident from the below part of log where we can see that 16336 has
exited after new process started to use the slot.

2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG:  received
replication command: START_REPLICATION SLOT "tap_sub" LOGICAL
0/16BEEB0 (proto_version '1', publication_names
'"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
publication_names '"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR:  replication
slot "tap_sub" is active for PID 16336
2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
publication_names '"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG:  disconnection:
session time: 0:00:00.036 user=nm database=postgres host=[local]
2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG:  disconnection:
session time: 0:00:06.367 user=nm database=postgres host=[local]

One idea to solve this is to first disable the subscription, wait for
the walsender process to exit, and then change the connection string
and re-enable the subscription.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

2021-08-16 Thread Greg Nancarrow

On Mon, Aug 16, 2021 at 5:54 PM houzj.f...@fujitsu.com
 wrote:
>
> Here is another comment:
>
> +char *
> +logicalrep_message_type(LogicalRepMsgType action)
> +{
> ...
> +   case LOGICAL_REP_MSG_STREAM_END:
> +   return "STREAM END";
> ...
>
> I think most the existing code use "STREAM STOP" to describe the
> LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
> function logicalrep_message_type() too ?
>

+1
I think you're right, it should be "STREAM STOP" in that case.


Regards,
Greg Nancarrow
Fujitsu Australia

RE: Skipping logical replication transactions on subscriber side

2021-08-16 Thread houzj.f...@fujitsu.com

Monday, August 16, 2021 3:00 PM Hou, Zhijie wrote:
> On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada  wrote:
> > I've attached the updated patches. FYI I've included the patch
> > (v8-0005) that fixes the assertion failure during shared fileset
> > cleanup to make cfbot tests happy.
> 
> Hi,
> 
> Thanks for the new patches.
> I have a few comments on the v8-0001 patch.
> 3)
> Do we need to invoke set_apply_error_context_xact() in the function
> apply_handle_stream_prepare() to save the xid and timestamp ?

Sorry, this comment wasn't correct, please ignore it.
Here is another comment:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+   case LOGICAL_REP_MSG_STREAM_END:
+   return "STREAM END";
...

I think most the existing code use "STREAM STOP" to describe the
LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
function logicalrep_message_type() too ?

Best regards,
Hou zj

Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

2021-08-16 Thread Michael Paquier

On Wed, Aug 11, 2021 at 10:41:59PM +0200, Michael Meskes wrote:
> I'm not sure I understand. Any usage of DECLARE STATEMENT makes the
> file need the current version of ecpg anyway. On the other hand
> DEALLOCATE did not change its behavior if no DECLARE STATEMENT was
> issued, or what did I miss?

Yes, you are right here.  I went through the code again and noticed by
mistake.  Sorry for the noise.
--
Michael


signature.asc
Description: PGP signature

RE: Skipping logical replication transactions on subscriber side

2021-08-16 Thread houzj.f...@fujitsu.com

On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada  wrote:
> I've attached the updated patches. FYI I've included the patch
> (v8-0005) that fixes the assertion failure during shared fileset cleanup to 
> make
> cfbot tests happy.

Hi,

Thanks for the new patches.
I have a few comments on the v8-0001 patch.

1)
+
+   if (TransactionIdIsNormal(errarg->remote_xid))
+   appendStringInfo(&buf, _(" in transaction id %u with commit 
timestamp %s"),
+errarg->remote_xid,
+errarg->commit_ts == 0
+? "(unset)"
+: 
timestamptz_to_str(errarg->commit_ts));
+
+   errcontext("%s", buf.data);

I think we can output the timestamp in a separete check which can be more
consistent with the other code style in apply_error_callback()
(ie)
+   if (errarg->commit_ts != 0)
+   appendStringInfo(&buf, _(" with commit timestamp %s"),
+   
timestamptz_to_str(errarg->commit_ts));


2)
+/*
+ * Get string representing LogicalRepMsgType.
+ */
+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+
+   elog(ERROR, "invalid logical replication message type \"%c\"", action);
+}

Some old compilers might complain that the function doesn't have a return value
at the end of the function, maybe we can code like the following:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
+   switch (action)
+   {
+   case LOGICAL_REP_MSG_BEGIN:
+   return "BEGIN";
...
+   default:
+   elog(ERROR, "invalid logical replication message type 
\"%c\"", action);
+   }
+   return NULL;/* keep compiler quiet */
+}


3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Best regards,
Hou zj

71 matches

Mail list logo