Re: Diagnostic comment in LogicalIncreaseXminForSlot

2021-05-20 Thread Amit Kapila
On Thu, May 20, 2021 at 5:43 PM Ashutosh Bapat
 wrote:
>
> Hi
> LogicalIncreaseRestartDecodingForSlot() has a debug log to report a
> new restart_lsn. But the corresponding function for catalog_xmin.
> Here's a patch to add the same.
>

I think this can be useful. One minor comment:
+ elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin,
+ (uint32) (current_lsn >> 32), (uint32) current_lsn);

Isn't it better to use LSN_FORMAT_ARGS for current_lsn? Also, there
doesn't seem to be any urgency for adding this, so you can register it
for the next CF so that we can add this when the branch opens for
PG-15.

-- 
With Regards,
Amit Kapila.




Re: Fdw batch insert error out when set batch_size > 65535

2021-05-20 Thread Bharath Rupireddy
On Fri, May 21, 2021 at 8:18 AM houzj.f...@fujitsu.com
 wrote:
>
> Hi,
>
> When reading the code, I noticed some possible issue about fdw batch insert.
> When I set batch_size > 65535 and insert more than 65535 rows into foreign 
> table,
> It will throw an error:
>
> For example:
>
> --
> CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar)
>   SERVER omega_db
>   OPTIONS (table_name 'tabulka', batch_size '65536');
>
> INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i FROM 
> generate_series(1,65536) g(i);
>
> ERROR:  number of parameters must be between 0 and 65535
> CONTEXT:  remote SQL command: INSERT INTO public.tabulka(a, b) VALUES ($1, 
> $2), ($3, $4)...
> --
>
> Actually, I think if the (number of columns) * (number of rows) > 65535, then 
> we can
> get this error. But, I think almost no one will set such a large value, so I 
> think adjust the
> batch_size automatically when doing INSERT seems an acceptable solution.
>
> In the postgresGetForeignModifyBatchSize(), if we found the (num of param) * 
> batch_size
> Is greater than the limit(65535), then we set it to 65535 / (num of param).
>
> Thoughts ?

+1 to limit batch_size for postgres_fdw and it's a good idea to have a
macro for the max params.

Few comments:
1) How about using macro in the error message, something like below?
appendPQExpBuffer(errorMessage,
  libpq_gettext("number of parameters must be
between 0 and %d\n"),
  PQ_MAX_PARAM_NUMBER);
2) I think we can choose not mention the 65535 because it's hard to
maintain if that's changed in libpq protocol sometime in future. How
about
"The final number of rows postgres_fdw inserts in a batch actually
depends on the number of columns and the provided batch_size value.
This is because of the limit the libpq protocol (which postgres_fdw
uses to connect to a remote server) has on the number of query
parameters that can be specified per query. For instance, if the
number of columns * batch_size is more than the limit, then the libpq
emits an error. But postgres_fdw adjusts the batch_size to avoid this
error."
instead of
+   overrides an option specified for the server. Note if the batch size
+   exceed the protocol limit (number of columns * batch_size > 65535),
+   then the actual batch size will be less than the specified batch_size.
3) I think "postgres_fdw should insert in each insert operation"
doesn't hold after this  patch. We can reword it to "postgres_fdw
inserts in each insert operation".
   This option specifies the number of rows
postgres_fdw
   should insert in each insert operation. It can be specified for a
4) How about naming the macro as PQ_QUERY_PARAM_MAX_LIMIT?
5) We can use macro in code comments as well.
+ * 65535, so set the batch_size to not exceed limit in a batch insert.
6) I think both code and docs patches can be combined into a single patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: parallel vacuum - few questions on docs, comments and code

2021-05-20 Thread Amit Kapila
On Fri, May 14, 2021 at 6:00 PM Bharath Rupireddy
 wrote:
>
> On Fri, May 14, 2021 at 10:43 AM Dilip Kumar  wrote:
> >
> >
> > Your changes look good.  About changing the "non-negative integer" to
> > "greater than or equal to zero", there is another thread [1], I am not
> > sure that have we concluded anything there yet.
> >
> > - pg_log_error("parallel vacuum degree must be a non-negative integer");
> > + pg_log_error("parallel workers for vacuum must be greater than or
> > equal to zero");
> >   exit(1);
> >
> > [1] 
> > https://www.postgresql.org/message-id/os0pr01mb5716415335a06b489f1b3a8194...@os0pr01mb5716.jpnprd01.prod.outlook.com
>
> Yeah. Tom proposed if (foo <= 0) { error:"foo must be greater than
> zero" } at [1]. In the subsequent messages both Michael and I agreed
> with that. But we also have cases like  if (foo < 0) for which I think
> { error:"foo must be greater than or equal to zero" } would be better,
> similar to what's proposed. Please feel free to provide your thoughts
> there in that thread.
>

I responded on that thread and it seems there is no object to the new
message. I have a minor comment on your patch:

- printf(_("  -P, --parallel=PARALLEL_DEGREE  use this many background
workers for vacuum, if available\n"));
+ printf(_("  -P, --parallel=PARALLEL_WORKERS use this many background
workers for vacuum, if available\n"));

If the patch changes the vacuumdb code as above then isn't it better
to change the vacuumdb docs to reflect the same. See below part of
vacuumdb docs:
-P parallel_degree
--parallel=parallel_degree

Also, can you please check if your patch works for PG-13 as well
because I think it is better to backpatch it?

-- 
With Regards,
Amit Kapila.




Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS

2021-05-20 Thread Amit Kapila
On Fri, May 21, 2021 at 6:29 AM Michael Paquier  wrote:
>
> On Thu, May 20, 2021 at 02:57:40PM -0400, Tom Lane wrote:
> > Here's a version that does it like that.  I'm not entirely convinced
> > whether this is better or not.
>
> Hmm.  I think that this is better.  This makes the code easier to
> follow, and the extra information is useful for debugging.
>
> The change looks good to me.
>

Yeah, the change looks good to me as well but I think we should
consider Amit L's point that maintaining this extra activeRelInfo
might be prone to bugs if the partitioning logic needs to be extended
at other places in the worker.c. As the code stands today, it doesn't
seem problematic so we can go with the second patch if both Tom and
you feel that is a better option.

-- 
With Regards,
Amit Kapila.




Re: Re[3]: On login trigger: take three

2021-05-20 Thread Greg Nancarrow
On Thu, May 20, 2021 at 2:45 PM Ivan Panchenko  wrote:
>
> I have upgraded the patch for the 14th version.
>

I have some feedback on the patch:

(1) The patch adds 3 whitespace errors ("git apply "
reports 3 warnings)


(2) doc/src/sgml/catalogs.sgml

CURRENTLY:
This flag is used to avoid extra lookup of pg_event_trigger table on
each backend startup.

SUGGEST:
This flag is used to avoid extra lookups on the pg_event_trigger table
during each backend startup.


(3) doc/src/sgml/config.sgml

CURRENTLY:
Errors in trigger code can prevent user to login to the system.In this
case disabling this parameter in connection string can solve the
problem:

SUGGEST:
Errors in the trigger code can prevent a user from logging in to the
system. In this case, the parameter can be disabled in the connection
string, to allow the user to login:


(4) doc/src/sgml/event-trigger.sgml

(i)

CURRENTLY:
An event trigger fires whenever the event with which it is associated
occurs in the database in which it is defined. Currently, the only

SUGGEST:
An event trigger fires whenever an associated event occurs in the
database in which it is defined. Currently, the only


(ii)

CURRENTLY:
can be useful for client connections logging,

SUGGEST:
can be useful for logging client connections,


(5) src/backend/commands/event_trigger.c

(i) There are two instances of code blocks like:

   = table_open(...);
  tuple = SearchSysCacheCopy1(...);
  table_close(...);

These should end with: "heap_freetuple(tuple);"

(ii) Typo "nuder" and grammar:

CURRENTLY:
There can be race condition: event trigger may be added after we have
scanned pg_event_trigger table. Repeat this test nuder  pg_database
table lock.

SUGGEST:
There can be a race condition: the event trigger may be added after we
have scanned the pg_event_trigger table. Repeat this test under the
pg_database table lock.


(6) src/backend/utils/misc/postgresql.conf.sample

CURRENTLY:
+#enable_client_connection_trigger = true # enables firing the
client_connection trigger when a client connect

SUGGEST:
+#enable_client_connection_trigger = true # enables firing the
client_connection trigger when a client connects


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-20 Thread Masahiko Sawada
On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
 wrote:
>
>
>
> On 2021/05/21 10:39, Masahiko Sawada wrote:
> > On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  
> > wrote:
> >>
> >>
> >> On 2021/05/11 13:37, Masahiko Sawada wrote:
> >>> I've attached the updated patches that incorporated comments from
> >>> Zhihong and Ikeda-san.
> >>
> >> Thanks for updating the patches!
> >>
> >>
> >> I have other comments including trivial things.
> >>
> >>
> >> a. about "foreign_transaction_resolver_timeout" parameter
> >>
> >> Now, the default value of "foreign_transaction_resolver_timeout" is 60 
> >> secs.
> >> Is there any reason? Although the following is minor case, it may confuse 
> >> some
> >> users.
> >>
> >> Example case is that
> >>
> >> 1. a client executes transaction with 2PC when the resolver is processing
> >> FdwXactResolverProcessInDoubtXacts().
> >>
> >> 2. the resolution of 1st transaction must be waited until the other
> >> transactions for 2pc are executed or timeout.
> >>
> >> 3. if the client check the 1st result value, it should wait until 
> >> resolution
> >> is finished for atomic visibility (although it depends on the way how to
> >> realize atomic visibility.) The clients may be waited
> >> foreign_transaction_resolver_timeout". Users may think it's stale.
> >>
> >> Like this situation can be observed after testing with pgbench. Some
> >> unresolved transaction remains after benchmarking.
> >>
> >> I assume that this default value refers to wal_sender, archiver, and so on.
> >> But, I think this parameter is more like "commit_delay". If so, 60 seconds
> >> seems to be big value.
> >
> > IIUC this situation seems like the foreign transaction resolution is
> > bottle-neck and doesn’t catch up to incoming resolution requests. But
> > how foreignt_transaction_resolver_timeout relates to this situation?
> > foreign_transaction_resolver_timeout controls when to terminate the
> > resolver process that doesn't have any foreign transactions to
> > resolve. So if we set it several milliseconds, resolver processes are
> > terminated immediately after each resolution, imposing the cost of
> > launching resolver processes on the next resolution.
>
> Thanks for your comments!
>
> No, this situation is not related to the foreign transaction resolution is
> bottle-neck or not. This issue may happen when the workload has very few
> foreign transactions.
>
> If new foreign transaction comes while the transaction resolver is processing
> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
> waits until starting next transaction resolution. If next foreign transaction
> doesn't come, the foreign transaction must wait starting resolution until
> timeout. I mentioned this situation.

Thanks for your explanation. I think that in this case we should set
the latch of the resolver after preparing all foreign transactions so
that the resolver process those transactions without sleep.

>
> Thanks for letting me know the side effect if setting resolution timeout to
> several milliseconds. I agree. But, why termination is needed? Is there a
> possibility to stale like walsender?

The purpose of this timeout is to terminate resolvers that are idle
for a long time. The resolver processes don't necessarily need to keep
running all the time for every database. On the other hand, launching
a resolver process per commit would be a high cost. So we have
resolver processes keep running at least for
foreign_transaction_resolver_timeout.

>
>
> >>
> >>
> >> b. about performance bottleneck (just share my simple benchmark results)
> >>
> >> The resolver process can be performance bottleneck easily although I think
> >> some users want this feature even if the performance is not so good.
> >>
> >> I tested with very simple workload in my laptop.
> >>
> >> The test condition is
> >> * two remote foreign partitions and one transaction inserts an entry in 
> >> each
> >> partitions.
> >> * local connection only. If NW latency became higher, the performance 
> >> became
> >> worse.
> >> * pgbench with 8 clients.
> >>
> >> The test results is the following. The performance of 2PC is only 10%
> >> performance of the one of without 2PC.
> >>
> >> * with foreign_twophase_commit = requried
> >> -> If load with more than 10TPS, the number of unresolved foreign 
> >> transactions
> >> is increasing and stop with the warning "Increase
> >> max_prepared_foreign_transactions".
> >
> > What was the value of max_prepared_foreign_transactions?
>
> Now, I tested with 200.
>
> If each resolution is finished very soon, I thought it's enough because
> 8clients x 2partitions = 16, though... But, it's difficult how to know the
> stable values.

During resolving one distributed transaction, the resolver needs both
one round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be
resolved, the resolver process can be easily bottle-neck given there
are 8 

Re: Addition of authenticated ID to pg_stat_activity

2021-05-20 Thread Michael Paquier
On Tue, May 18, 2021 at 11:20:49AM +0900, Michael Paquier wrote:
> So that would mean the addition of one new catalog view, called
> pg_stat_connection, with the following fields:
> - PID
> - all three client_*
> - authn ID
> I can live with this split.  Thoughts from others?

Just to make the discussion move on, attached is an updated version
doing that.
--
Michael
From 749422386ed7fd24219c3bce6a5ecfb693602d89 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 21 May 2021 13:27:05 +0900
Subject: [PATCH v2] Add authenticated data to pg_stat_activity

---
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/utils/backend_status.h|  16 ++-
 src/backend/catalog/system_views.sql  |  12 +-
 src/backend/utils/activity/backend_status.c   |  60 +
 src/backend/utils/adt/pgstatfuncs.c   |  21 +++-
 src/backend/utils/misc/guc.c  |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/test/ldap/t/001_auth.pl   |   4 +-
 src/test/regress/expected/rules.out   |  17 +--
 src/test/ssl/t/001_ssltests.pl|   7 +-
 doc/src/sgml/config.sgml  |  18 +++
 doc/src/sgml/monitoring.sgml  | 119 +-
 12 files changed, 212 insertions(+), 80 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index acbcae4607..6c35496554 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5280,9 +5280,9 @@
   proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
   prorettype => 'record', proargtypes => 'int4',
-  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id}',
+  proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8,text}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,query_id,authenticated_id}',
   prosrc => 'pg_stat_get_activity' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 8042b817df..5762d201b0 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -145,13 +145,16 @@ typedef struct PgBackendStatus
 	char	   *st_appname;
 
 	/*
-	 * Current command string; MUST be null-terminated. Note that this string
-	 * possibly is truncated in the middle of a multi-byte character. As
-	 * activity strings are stored more frequently than read, that allows to
-	 * move the cost of correct truncation to the display side. Use
-	 * pgstat_clip_activity() to truncate correctly.
+	 * Current command string and authenticated ID string; MUST be
+	 * null-terminated.  Note that those strings possibly are truncated in
+	 * the middle of a multi-byte character.  As activity strings are
+	 * stored more frequently than read, that allows to move the cost of
+	 * correct truncation to the display side.  Authenticated ID strings
+	 * are stored once at backend startup but the cost is minimal.
+	 * Use pgstat_clip_string() to truncate both of them correctly.
 	 */
 	char	   *st_activity_raw;
+	char	   *st_authn_id;
 
 	/*
 	 * Command progress reporting.  Any command which wishes can advertise
@@ -267,6 +270,7 @@ typedef struct LocalPgBackendStatus
  */
 extern PGDLLIMPORT bool pgstat_track_activities;
 extern PGDLLIMPORT int pgstat_track_activity_query_size;
+extern PGDLLIMPORT int pgstat_track_connection_authn_size;
 
 
 /* --
@@ -315,7 +319,7 @@ extern uint64 pgstat_get_my_query_id(void);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgBackendStatus *pgstat_fetch_stat_beentry(int beid);
 extern LocalPgBackendStatus *pgstat_fetch_stat_local_beentry(int beid);
-extern char *pgstat_clip_activity(const char *raw_activity);

Re: Multi-Column List Partitioning

2021-05-20 Thread Amit Langote
Hello Nitin,

On Thu, May 6, 2021 at 11:03 PM Nitin Jadhav
 wrote:
>
> Hi,
>
> While reviewing one of the 'Table partitioning' related patches, I found that 
> Postgres does not support multiple column based LIST partitioning. Based on 
> this understanding, I have started working on this feature. I also feel that 
> 'Multi-Column List Partitioning' can be benefited to the Postgres users in 
> future.

Yes, it would be nice to have this.  Thanks for picking this up.

> I am attaching the WIP patch for this feature here. It supports 'Multi-Column 
> List Partitioning', however some tasks are still pending. I would like to 
> know your thoughts about this, So that I can continue the work with 
> improvising the current patch.
>
> Following things are handled in the patch.
> 1. Syntax
>
> CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);
>
> Earlier there was no provision to mention multiple columns as part of the 
> 'list_of_columns' clause. Now we can mention the list of columns separated by 
> comma.
>
> CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN 
> list_of_values.
>
> Whereas list_of_columns can be
> a. (value [,...])
> b. (value [,...]) [,...]
>
> I would like to list a few examples here for better understanding.
> Ex-1:
> CREATE TABLE t1(a int) PARTITION BY LIST(a);
> CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);
>
> Ex-2:
> CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2, 2),(2, 
> 10);

Hmm, why not have parentheses around these lists, that is: (
(list_of_values) [, ...] )

So your example would look like this:

CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN ((1, 2), (1, 5), (2,
2), (2, 10));

IMO, it is not such a bad syntax from a user's PoV.  It's not hard to
understand from this syntax that the partition constraint is something
like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
row-wise comparison.

I will now take a look at the patch itself.

--
Amit Langote
EDB: http://www.enterprisedb.com




Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-20 Thread Masahiro Ikeda



On 2021/05/21 10:39, Masahiko Sawada wrote:
> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  
> wrote:
>>
>>
>> On 2021/05/11 13:37, Masahiko Sawada wrote:
>>> I've attached the updated patches that incorporated comments from
>>> Zhihong and Ikeda-san.
>>
>> Thanks for updating the patches!
>>
>>
>> I have other comments including trivial things.
>>
>>
>> a. about "foreign_transaction_resolver_timeout" parameter
>>
>> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
>> Is there any reason? Although the following is minor case, it may confuse 
>> some
>> users.
>>
>> Example case is that
>>
>> 1. a client executes transaction with 2PC when the resolver is processing
>> FdwXactResolverProcessInDoubtXacts().
>>
>> 2. the resolution of 1st transaction must be waited until the other
>> transactions for 2pc are executed or timeout.
>>
>> 3. if the client check the 1st result value, it should wait until resolution
>> is finished for atomic visibility (although it depends on the way how to
>> realize atomic visibility.) The clients may be waited
>> foreign_transaction_resolver_timeout". Users may think it's stale.
>>
>> Like this situation can be observed after testing with pgbench. Some
>> unresolved transaction remains after benchmarking.
>>
>> I assume that this default value refers to wal_sender, archiver, and so on.
>> But, I think this parameter is more like "commit_delay". If so, 60 seconds
>> seems to be big value.
> 
> IIUC this situation seems like the foreign transaction resolution is
> bottle-neck and doesn’t catch up to incoming resolution requests. But
> how foreignt_transaction_resolver_timeout relates to this situation?
> foreign_transaction_resolver_timeout controls when to terminate the
> resolver process that doesn't have any foreign transactions to
> resolve. So if we set it several milliseconds, resolver processes are
> terminated immediately after each resolution, imposing the cost of
> launching resolver processes on the next resolution.

Thanks for your comments!

No, this situation is not related to the foreign transaction resolution is
bottle-neck or not. This issue may happen when the workload has very few
foreign transactions.

If new foreign transaction comes while the transaction resolver is processing
resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
waits until starting next transaction resolution. If next foreign transaction
doesn't come, the foreign transaction must wait starting resolution until
timeout. I mentioned this situation.

Thanks for letting me know the side effect if setting resolution timeout to
several milliseconds. I agree. But, why termination is needed? Is there a
possibility to stale like walsender?


>>
>>
>> b. about performance bottleneck (just share my simple benchmark results)
>>
>> The resolver process can be performance bottleneck easily although I think
>> some users want this feature even if the performance is not so good.
>>
>> I tested with very simple workload in my laptop.
>>
>> The test condition is
>> * two remote foreign partitions and one transaction inserts an entry in each
>> partitions.
>> * local connection only. If NW latency became higher, the performance became
>> worse.
>> * pgbench with 8 clients.
>>
>> The test results is the following. The performance of 2PC is only 10%
>> performance of the one of without 2PC.
>>
>> * with foreign_twophase_commit = requried
>> -> If load with more than 10TPS, the number of unresolved foreign 
>> transactions
>> is increasing and stop with the warning "Increase
>> max_prepared_foreign_transactions".
> 
> What was the value of max_prepared_foreign_transactions?

Now, I tested with 200.

If each resolution is finished very soon, I thought it's enough because
8clients x 2partitions = 16, though... But, it's difficult how to know the
stable values.


> To speed up the foreign transaction resolution, some ideas have been
> discussed. As another idea, how about launching resolvers for each
> foreign server? That way, we resolve foreign transactions on each
> foreign server in parallel. If foreign transactions are concentrated
> on the particular server, we can have multiple resolvers for the one
> foreign server. It doesn’t change the fact that all foreign
> transaction resolutions are processed by resolver processes.

Awesome! There seems to be another pros that even if a foreign server is
temporarily busy or stopped due to fail over, other foreign server's
transactions can be resolved.



> Apart from that, we also might want to improve foreign transaction
> management so that transaction doesn’t end up with an error if the
> foreign transaction resolution doesn’t catch up with incoming
> transactions that require 2PC. Maybe we can evict and serialize a
> state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a
> future improvement.

Oh, great! I didn't come up with the idea.

Although I thought the feature makes difficult to know 

Re: "Multiple table synchronizations are processed serially" still happens

2021-05-20 Thread Amit Kapila
On Fri, May 21, 2021 at 1:30 AM Guillaume Lelarge
 wrote:
>
>
>> If so, the
>> problem might be that copying the data of the first table creates a
>> transaction which blocks creation of the slot for second table copy.
>
>
> I don't understand how a transaction could block the creation of a slot. 
> Could you explain that to me?
>

During the creation of the slot, we need to build the initial snapshot
which is used for decoding WAL. Now, to build the initial snapshot, we
wait for all running xacts to finish. See functions
CreateReplicationSlot() and SnapBuildFindSnapshot().

-- 
With Regards,
Amit Kapila.




Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Amit Kapila
On Thu, May 20, 2021 at 5:55 PM Andrew Dunstan  wrote:
>
> On 5/20/21 7:15 AM, Michael Paquier wrote:
> > On Thu, May 20, 2021 at 07:05:05AM -0400, Andrew Dunstan wrote:
> >> Your version of perl is apparently too old for this. Looks like that
> >> needs to be 5.22 or later: 
> > Hmm.  src/test/perl/README tells about 5.8.0.  That's quite a jump.
>
> Yes. I've pushed a fix that should take care of the issue.
>

Thanks. It is working now.

-- 
With Regards,
Amit Kapila.




Re: Commitfest app vs. pgsql-docs

2021-05-20 Thread Julien Rouhaud
On Thu, May 20, 2021 at 8:39 AM Michael Paquier  wrote:
>
> On Wed, May 19, 2021 at 03:35:00PM -0400, Tom Lane wrote:
> > Magnus Hagander  writes:
> >> Changing that to look globally can certainly be done. It takes a bit
> >> of work I think, as there are no API endpoints today that will do
> >> that, but those could be added.
> >
> > Ah.  Personally, I'd settle for it checking -hackers, -docs and -bugs.
> > Perhaps there's some case for -general as well.
>
> FWIW, I have seen cases for -general in the past.

+1, I had the problem with -general not being usable multiple times.




Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Tom Lane
Andrew Dunstan  writes:
> On 5/20/21 9:49 PM, Michael Paquier wrote:
>> Are older versions of the perl MSI that activestate provides hard to
>> come by?  FWIW, I would not mind if this README and the docs are
>> updated to mention that on Windows we require a newer version for this
>> set of MSIs.

> I've fixed the coding that led to this particular problem. So for now
> let's let sleeping dogs lie.

Seems like the right solution is for somebody to be running a
buildfarm animal on Windows with an old perl version.

regards, tom lane




Fdw batch insert error out when set batch_size > 65535

2021-05-20 Thread houzj.f...@fujitsu.com
Hi,

When reading the code, I noticed some possible issue about fdw batch insert.
When I set batch_size > 65535 and insert more than 65535 rows into foreign 
table, 
It will throw an error:

For example:

--
CREATE FOREIGN TABLE vzdalena_tabulka2(a int, b varchar)
  SERVER omega_db
  OPTIONS (table_name 'tabulka', batch_size '65536');

INSERT INTO vzdalena_tabulka2 SELECT i, 'AHOJ' || i FROM 
generate_series(1,65536) g(i);

ERROR:  number of parameters must be between 0 and 65535
CONTEXT:  remote SQL command: INSERT INTO public.tabulka(a, b) VALUES ($1, $2), 
($3, $4)...
--

Actually, I think if the (number of columns) * (number of rows) > 65535, then 
we can
get this error. But, I think almost no one will set such a large value, so I 
think adjust the
batch_size automatically when doing INSERT seems an acceptable solution.

In the postgresGetForeignModifyBatchSize(), if we found the (num of param) * 
batch_size
Is greater than the limit(65535), then we set it to 65535 / (num of param).

Thoughts ?

Best regards,
houzj




0001-limit-the-fdw-batch-size.patch
Description: 0001-limit-the-fdw-batch-size.patch


0002-doc-note.patch
Description: 0002-doc-note.patch


Re: Race condition in recovery?

2021-05-20 Thread Kyotaro Horiguchi
At Thu, 20 May 2021 13:49:10 -0400, Robert Haas  wrote 
in 
> On Tue, May 18, 2021 at 1:33 AM Dilip Kumar  wrote:
> > Yeah, it will be a fake 1-element list.  But just to be clear that
> > 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and
> > nothing else, do you agree to this?  Because we initialize
> > recoveryTargetTLI to this value and we might change it in
> > readRecoveryCommandFile() but for that, we must get the history file,
> > so if we are talking about the case where we don't have the history
> > file then "recoveryTargetTLI" will still be
> > "ControlFile->checkPointCopy.ThisTimeLineID".
> 
> I don't think your conclusion is correct. As I understand it, you're
> talking about the state before
> ee994272ca50f70b53074f0febaec97e28f83c4e, because as of now
> readRecoveryCommandFile() no longer exists in master. Before that
> commit, StartupXLOG did this:
> 
> recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
> readRecoveryCommandFile();
> expectedTLEs = readTimeLineHistory(recoveryTargetTLI);
> 
> Now, readRecoveryCommandFile() can change recoveryTargetTLI. Before
> doing so, it will call existsTimeLineHistory if
> recovery_target_timeline was set to an integer, and findNewestTimeLine
> if it was set to latest. In the first case, existsTimeLineHistory()
> calls RestoreArchivedFile(), but that restores it using a temporary
> name, and KeepFileRestoredFromArchive is not called, so we might have
> the timeline history in RECOVERYHISTORY but that's not the filename
> we're actually going to try to read from inside readTimeLineHistory().
> In the second case, findNewestTimeLine() will call
> existsTimeLineHistory() which results in the same situation. So I
> think when readRecoveryCommandFile() returns expectedTLI can be
> different but the history file can be absent since it was only ever
> restored under a temporary name.

Anyway it seems that the commit tried to fix an issue happens without
using WAL archive.

https://www.postgresql.org/message-id/50E43C57.5050101%40vmware.com

> That leaves one case not covered: If you take a backup with plain 
> "pg_basebackup" from a standby, without -X, and the first WAL segment 
> contains a timeline switch (ie. you take the backup right after a 
> failover), and you try to recover from it without a WAL archive, it 
> doesn't work. This is the original issue that started this thread, 
> except that I used "-x" in my original test case. The problem here is 
> that even though streaming replication will fetch the timeline history 
> file when it connects, at the very beginning of recovery, we expect that 
> we already have the timeline history file corresponding the initial 
> timeline available, either in pg_xlog or the WAL archive. By the time 
> streaming replication has connected and fetched the history file, we've 
> already initialized expectedTLEs to contain just the latest TLI. To fix 
> that, we should delay calling readTimeLineHistoryFile() until streaming 
> replication has connected and fetched the file.
> If the first segment read by recovery contains a timeline switch, the first
> pages have older timeline than segment timeline. However, if
> exepectedTLEs contained only the segment timeline, we cannot know if
> we can use the record.  In that case the following error is issued.

If expectedTLEs is initialized with the pseudo list,
tliOfPointInHistory always return the recoveryTargetTLI regardless of
the given LSN so the checking about timelines later doesn't work. And
later ReadRecord is surprised to see a page of an unknown timeline.

"unexpected timeline ID 1 in log segment"

So the objective is to initialize expectedTLEs with the right content
of the history file for the recoveryTargetTLI until ReadRecord fetches
the first record.  After the fix things are working as the following.

- recoveryTargetTimeLine is initialized with
  ControlFile->checkPointCopy.ThisTimeLineID

- readRecoveryCommandFile():

  Move recoveryTargetTLI forward to the specified target timline if
  the history file for the timeline is found, or in the case of
  latest, move it forward up to the maximum timeline among the history
  files found in either pg_wal or archive.

  !!! Anyway recoveryTargetTLI won't goes back behind the checkpoint
  TLI.

- ReadRecord...XLogFileReadAnyTLI

  Tries to load the history file for recoveryTargetTLI either from
  pg_wal or archive onto local TLE list, if the history file is not
  found, use a generateed list with one entry for the
  recoveryTargetTLI.

  (a) If the specified segment file for any timeline in the TLE list
is found, expectedTLEs is initialized with the local list. No need
to worry about expectedTLEs any longer.

  (b) If such a segment is *not* found, expectedTLEs is left
NIL. Usually recoveryTargetTLI is equal to the last checkpoint
TLI.

  (c) However, in the case where timeline switches happened in the
segment and the recoveryTargetTLI has 

Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Andrew Dunstan


On 5/20/21 9:49 PM, Michael Paquier wrote:
> On Thu, May 20, 2021 at 08:25:45AM -0400, Andrew Dunstan wrote:
>> 5.8 is ancient. Yes I know it's what's in the Msys1 DTK, but the DTK
>> perl seems happy with the list form of pipe open - it's only native
>> windows perl's that aren't.
> Respect to the Msys1 DTK for that.
>
>> Maybe it's time to update the requirement a bit, at least for running
>> TAP tests.
> Are older versions of the perl MSI that activestate provides hard to
> come by?  FWIW, I would not mind if this README and the docs are
> updated to mention that on Windows we require a newer version for this
> set of MSIs.


I've fixed the coding that led to this particular problem. So for now
let's let sleeping dogs lie.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Michael Paquier
On Thu, May 20, 2021 at 08:25:45AM -0400, Andrew Dunstan wrote:
> 5.8 is ancient. Yes I know it's what's in the Msys1 DTK, but the DTK
> perl seems happy with the list form of pipe open - it's only native
> windows perl's that aren't.

Respect to the Msys1 DTK for that.

> Maybe it's time to update the requirement a bit, at least for running
> TAP tests.

Are older versions of the perl MSI that activestate provides hard to
come by?  FWIW, I would not mind if this README and the docs are
updated to mention that on Windows we require a newer version for this
set of MSIs.
--
Michael


signature.asc
Description: PGP signature


Re: Force disable of SSL renegociation in the server

2021-05-20 Thread Michael Paquier
On Thu, May 20, 2021 at 02:15:52PM +0200, Daniel Gustafsson wrote:
> On 20 May 2021, at 13:00, Michael Paquier  wrote:
>> - SSL_OP_NO_RENEGOTIATION controls that.  It is present in OpenSSL >=
>> 1.1.1 and has been backported in 1.1.0h (it is not present in older
>> versions of 1.1.0).
> 
> For OpenSSL 1.1.0 versions < 1.1.0h it will be silently accepted without
> actually doing anything, so we might want to combine it with the below.

Yeah, still that stresses me quite a bit.  OpenSSL does not have a
good history with compatibility, and we are talking about something
that does not officially exist on the map.

>> - In 1.0.2 and older versions, OpenSSL has an undocumented flag called
>> SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS, able to do the same as far as I
>> understand.
> 
> Well, it's documented in the changelog that it's undocumented (sigh..) along
> with a note stating that it works like SSL_OP_NO_RENEGOTIATION.

I'd say that this is still part of the definition of undocumented.
There is no mention of it in their online documentation :)

> Skimming the
> code it seems to ring true.  For older OpenSSL versions there's also
> SSL_OP_ALLOW_UNSAFE_LEGACY_RENEGOTIATION which controls renegotiation for an
> older OpenSSL reneg bug.  That applies to 0.9.8 versions which we don't
> support, but a malicious user can craft whatever they feel like so maybe we
> should ensure it's off as well?

If I am getting it right by reading upstream, SSL_OP_NO_RENEGOTIATION
takes priority over that.  Hence, if we force SSL_OP_NO_RENEGOTIATION,
then SSL_OP_ALLOW_UNSAFE_LEGACY_RENEGOTIATION has no effect anyway.

> + /* disallow SSL renegociation, option available since 1.1.0h */
> s/renegociation/renegotiation/

Argh, French-ism here.

> +1 on disabling renegotiation, but I think it's worth considering using
> SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS as well.

This one can be set within ssl->s3->flags in the port information.
Still that's not completely feasable either as some versions of
OpenSSL hide the internals of a bunch of internal structures, and some
distributions patch the upstream code?  At the end of the day, I think
that I would stick with simplicity and use SSL_OP_NO_RENEGOTIATION.
It is not our job to go around any decision OpenSSL has poorly done
either over the years.  At least this part is officially documented :)

> One could also argue that extending
> the comment with a note that it only applies to TLSv1.2 and lower could be
> helpful to readers who aren't familiar with TLS protocol versions.  TLSv1.3 
> did
> away with renegotiation.

Good idea to mention that.
--
Michael


signature.asc
Description: PGP signature


Re: Transactions involving multiple postgres foreign servers, take 2

2021-05-20 Thread Masahiko Sawada
On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda  wrote:
>
>
> On 2021/05/11 13:37, Masahiko Sawada wrote:
> > I've attached the updated patches that incorporated comments from
> > Zhihong and Ikeda-san.
>
> Thanks for updating the patches!
>
>
> I have other comments including trivial things.
>
>
> a. about "foreign_transaction_resolver_timeout" parameter
>
> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
> Is there any reason? Although the following is minor case, it may confuse some
> users.
>
> Example case is that
>
> 1. a client executes transaction with 2PC when the resolver is processing
> FdwXactResolverProcessInDoubtXacts().
>
> 2. the resolution of 1st transaction must be waited until the other
> transactions for 2pc are executed or timeout.
>
> 3. if the client check the 1st result value, it should wait until resolution
> is finished for atomic visibility (although it depends on the way how to
> realize atomic visibility.) The clients may be waited
> foreign_transaction_resolver_timeout". Users may think it's stale.
>
> Like this situation can be observed after testing with pgbench. Some
> unresolved transaction remains after benchmarking.
>
> I assume that this default value refers to wal_sender, archiver, and so on.
> But, I think this parameter is more like "commit_delay". If so, 60 seconds
> seems to be big value.

IIUC this situation seems like the foreign transaction resolution is
bottle-neck and doesn’t catch up to incoming resolution requests. But
how foreignt_transaction_resolver_timeout relates to this situation?
foreign_transaction_resolver_timeout controls when to terminate the
resolver process that doesn't have any foreign transactions to
resolve. So if we set it several milliseconds, resolver processes are
terminated immediately after each resolution, imposing the cost of
launching resolver processes on the next resolution.

>
>
> b. about performance bottleneck (just share my simple benchmark results)
>
> The resolver process can be performance bottleneck easily although I think
> some users want this feature even if the performance is not so good.
>
> I tested with very simple workload in my laptop.
>
> The test condition is
> * two remote foreign partitions and one transaction inserts an entry in each
> partitions.
> * local connection only. If NW latency became higher, the performance became
> worse.
> * pgbench with 8 clients.
>
> The test results is the following. The performance of 2PC is only 10%
> performance of the one of without 2PC.
>
> * with foreign_twophase_commit = requried
> -> If load with more than 10TPS, the number of unresolved foreign transactions
> is increasing and stop with the warning "Increase
> max_prepared_foreign_transactions".

What was the value of max_prepared_foreign_transactions?

To speed up the foreign transaction resolution, some ideas have been
discussed. As another idea, how about launching resolvers for each
foreign server? That way, we resolve foreign transactions on each
foreign server in parallel. If foreign transactions are concentrated
on the particular server, we can have multiple resolvers for the one
foreign server. It doesn’t change the fact that all foreign
transaction resolutions are processed by resolver processes.

Apart from that, we also might want to improve foreign transaction
management so that transaction doesn’t end up with an error if the
foreign transaction resolution doesn’t catch up with incoming
transactions that require 2PC. Maybe we can evict and serialize a
state file when FdwXactCtl->xacts[] is full. I’d like to leave it as a
future improvement.

> * with foreign_twophase_commit = disabled
> -> 122TPS in my environments.

How much is the performance without those 2PC patches and with the
same workload? i.e., how fast is the current postgres_fdw that uses
XactCallback?

>
>
> c. v36-0001-Introduce-transaction-manager-for-foreign-transa.patch
>
> * typo: s/tranasction/transaction/
>
> * Is it better to move AtEOXact_FdwXact() in AbortTransaction() to before "if
> (IsInParallelMode())" because make them in the same order as 
> CommitTransaction()?

I'd prefer to move AtEOXact_FdwXact() in CommitTransaction after "if
(IsInParallelMode())" since other pre-commit works are done after
cleaning parallel contexts. What do you think?

>
> * functions name of fdwxact.c
>
> Although this depends on my feeling, xact means transaction. If this feeling
> same as you, the function names of FdwXactRegisterXact and so on are odd to
> me. FdwXactRegisterEntry or FdwXactRegisterParticipant is better?
>

FdwXactRegisterEntry sounds good to me. Thanks.

> * Are the following better?
>
> - s/to register the foreign transaction by/to register the foreign transaction
> participant by/
>
> - s/The registered foreign transactions/The registered participants/
>
> - s/given foreign transaction/given foreign transaction participant/
>
> - s/Foreign transactions involved in the current transaction/Foreign
> 

Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS

2021-05-20 Thread Michael Paquier
On Thu, May 20, 2021 at 02:57:40PM -0400, Tom Lane wrote:
> Here's a version that does it like that.  I'm not entirely convinced
> whether this is better or not.

Hmm.  I think that this is better.  This makes the code easier to
follow, and the extra information is useful for debugging.

The change looks good to me.
--
Michael


signature.asc
Description: PGP signature


Re: Alias collision in `refresh materialized view concurrently`

2021-05-20 Thread Michael Paquier
On Thu, May 20, 2021 at 09:14:45PM +0530, Bharath Rupireddy wrote:
> On Thu, May 20, 2021 at 7:52 PM Bernd Helmle  wrote:
>> "mv" looks like a very common alias (i use it all over the time when
>> testing or playing around with materialized views, so i'm wondering why
>>  i didn't see this issue already myself). So the risk here for such a
>>  collision looks very high. We can try to lower this risk by choosing an
>>  alias name, which is not so common. With a static alias however you get
>>  a static error condition, not something that fails here and then.
> 
> Another idea is to use random() function to generate required number
> of uint32 random values(refresh_by_match_merge might need 3 values to
> replace newdata, newdata2 and mv) and use the names like
> pg_temp_rmv_<>,  pg_temp_rmv_<> and so on. This
> would make the name unguessable. Note that we use this in
> choose_dsm_implementation, dsm_impl_posix.

I am not sure that I see the point of using a random() number here
while the backend ID, or just the PID, would easily provide enough
entropy for this internal alias.  I agree that "mv" is a bad choice
for this alias name.  One thing that comes in mind here is to use an
alias similar to what we do for dropped attributes, say 
pg.matview.%d where %d is the PID.  This will very
unlikely cause conflicts.
--
Michael


signature.asc
Description: PGP signature


Re: Installation of regress.so?

2021-05-20 Thread Michael Paquier
On Thu, May 20, 2021 at 09:30:37AM -0400, Tom Lane wrote:
> I'd be okay with it being some sort of non-default option.

Okay.  It would be possible to control that with an environment
variable.  However I am wondering if it would not be more user-friendly
for automated environments if we had a configure switch to control
when things related to the tests are installed or not.  Say a
--with-test-install?
--
Michael


signature.asc
Description: PGP signature


Re: PG 14 release notes, first draft

2021-05-20 Thread Bruce Momjian
On Thu, May 20, 2021 at 04:19:50PM -0700, Peter Geoghegan wrote:
> On Thu, May 20, 2021 at 2:37 PM Bruce Momjian  wrote:
> > I went with this text:
> >
> > Reduce the default value of  > linkend="guc-vacuum-cost-page-miss"> from 10 milliseconds 
> > to 2
> > (Peter Geoghegan)
> >
> > I think with numbers, and the fact we are saying "decrease" havint them
> > in the from/to order is best.  If this was non-numeric, like to scram
> > from md5, it would make more sense to use to/from.
> 
> The point of this change was to make the cost of dirtying pages much
> higher than everything else, since writes are in effect much more
> expensive on modern hardware. Don't know if you need to say that.

I think our text "This new default better reflects current hardware
capabilities." is detailed enough.  People can dig into the item to see
what it does and how it adjusts costs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: array_cat anycompatible change is breaking xversion upgrade tests

2021-05-20 Thread Tom Lane
Justin Pryzby  writes:
> On Wed, Nov 04, 2020 at 07:43:51PM -0500, Tom Lane wrote:
>> As was discussed in the thread leading up to that commit, modifying the
>> signature of array_cat and friends could break user-defined operators
>> and aggregates based on those functions.  It seems to me that the
>> usability gain from this change is worth that cost, but it is causing
>> an issue for xversion tests.

> But I think this should be called out as an incompatible change in the release
> notes.

If it was not, yes it should be.

regards, tom lane




Re: array_cat anycompatible change is breaking xversion upgrade tests

2021-05-20 Thread Justin Pryzby
On Wed, Nov 04, 2020 at 07:43:51PM -0500, Tom Lane wrote:
> crake is showing xversion upgrade failures since 9e38c2bb50:
> 
> pg_restore: error: could not execute query: ERROR:  function 
> array_cat(anyarray, anyarray) does not exist
> Command was: CREATE AGGREGATE "public"."array_cat_accum"("anyarray") (
> SFUNC = "array_cat",
> STYPE = "anyarray",
> INITCOND = '{}'
> );
> 
> As was discussed in the thread leading up to that commit, modifying the
> signature of array_cat and friends could break user-defined operators
> and aggregates based on those functions.  It seems to me that the
> usability gain from this change is worth that cost, but it is causing
> an issue for xversion tests.

I upgraded an internal DB to v14b1, but it took several tries, since there were
errors during pg_restore regarding aggregates using polymorphic functions
anyarray, which are now anycompatiblearray.

I succeeded in upgrading after dropping our aggregates.

I have a backup from the v13 DB, and it restores okay on v13.
However it fails with the same errors when restoring into v14.

I think this was all known, so I'm just adding a data point.

It's be easy enough to replace our "anyarrays" with "anycompatiblearrays".

But I think this should be called out as an incompatible change in the release
notes.

pg_restore: error: could not execute query: ERROR:  function 
array_append(anyarray, anyelement) does not exist
Command was: CREATE AGGREGATE public.array_accum(anyelement) (
SFUNC = array_append,
STYPE = anyarray,
INITCOND = '{}',
PARALLEL = safe
);

pg_restore: error: could not execute query: ERROR:  function 
array_append(anyarray, anyelement) does not exist
Command was: CREATE AGGREGATE public.pdp_context_count(anyelement) (
SFUNC = array_append,
STYPE = anyarray,
INITCOND = '{}',
FINALFUNC = public._final_pdp_context_count,
PARALLEL = safe
);

pg_restore: error: could not execute query: ERROR:  function 
array_append(anyarray, anyelement) does not exist
Command was: CREATE AGGREGATE public.ts_mode(anyelement) (
SFUNC = array_append,
STYPE = anyarray,
INITCOND = '{}',
FINALFUNC = public._final_mode
);

-- 
Justin




Re: PG 14 release notes, first draft

2021-05-20 Thread Peter Geoghegan
On Thu, May 20, 2021 at 2:37 PM Bruce Momjian  wrote:
> I went with this text:
>
> Reduce the default value of  linkend="guc-vacuum-cost-page-miss"> from 10 milliseconds 
> to 2
> (Peter Geoghegan)
>
> I think with numbers, and the fact we are saying "decrease" havint them
> in the from/to order is best.  If this was non-numeric, like to scram
> from md5, it would make more sense to use to/from.

The point of this change was to make the cost of dirtying pages much
higher than everything else, since writes are in effect much more
expensive on modern hardware. Don't know if you need to say that.


-- 
Peter Geoghegan




Re: PG 14 release notes, first draft

2021-05-20 Thread Bruce Momjian
On Thu, May 20, 2021 at 04:35:07PM -0400, Álvaro Herrera wrote:
> On 2021-May-20, Bruce Momjian wrote:
> 
> > On Thu, May 20, 2021 at 02:55:18PM -0500, Justin Pryzby wrote:
> > > On Thu, May 20, 2021 at 03:44:46PM -0400, Bruce Momjian wrote:
> > > > > | Reduce the default value of vacuum_cost_page_miss (Peter Geoghegan) 
> > > > > |  This new default better reflects current hardware capabilities. 
> > > > > Also say: the previous default was 10.
> > > > 
> > > > Uh, we didn't report the new value, so why report the old one?
> > > 
> > > Good point.
> > > For symmetry with this one, maybe the old and new values should be 
> > > included?
> > 
> > Not sure.  Those values are kind of hard to understand, so I am afraid
> > there would be more confusion by mentioning them.
> > 
> > > |Change checkpoint_completion_target default to 0.9 (Stephen Frost)
> > > |The previous default was 0.5.
> > 
> > Uh, that one is frequently modified by users, to an extent I didn't
> > understand why we kept it at 0.5 for so long, which is why I mentioned
> > it.
> 
> You also mentioned 'md5' in the entry about password_encryption,
> remember?  I tend to agree with Justin: if it's not too much extra space
> to mention both values, let's just do that.  "Reduce the value of X to Y
> from Z.  The new default better reflects ..." seems OK to me.
> 
> I prefer "to Y from Z" rather than "from Z to Y", because then the new
> value appears first, which seems a tiny improvement in readability,
> though the phrase is in the opposite order of traditional.  Also it
> seems better than "change value of X to Y.  The previous default was Z"
> because it then becomes a little more verbose than really needed.  But
> maybe that's OK too.

I went with this text:

Reduce the default value of  from 10 milliseconds to 2
(Peter Geoghegan)

I think with numbers, and the fact we are saying "decrease" havint them
in the from/to order is best.  If this was non-numeric, like to scram
from md5, it would make more sense to use to/from.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Postgres perl module namespace

2021-05-20 Thread Tom Lane
Andrew Dunstan  writes:
> While solving a problem with the Beta RPMs, I noticed that they export
> our perl test modules as capabilities like this:

> [andrew@f34 x86_64]$ rpm -q --provides -p
> postgresql14-devel-14-beta1_PGDG.fc34.x86_64.rpm | grep ^perl
> perl(PostgresNode)
> perl(PostgresVersion)
> perl(RecursiveCopy)
> perl(SimpleTee)
> perl(TestLib)

> I don't think we should be putting this stuff in a global namespace like
> that. We should invent a namespace that's not likely to conflict with
> other people, like, say, 'PostgreSQL::Test' to put these modules. That
> would require moving some code around and adjusting a bunch of scripts,
> but it would not be difficult. Maybe something to be done post-14?

Agreed that we ought to namespace these better.  It looks to me like most
of these are several versions old.  Given the lack of field complaints,
I'm content to wait for v15 for a fix, rather than treating it as an open
item for v14.

> Meanwhile I would suggest that RPM maintainers exclude both requires and
> provides for these five names.

+1

regards, tom lane




Re: PG 14 release notes, first draft

2021-05-20 Thread Alvaro Herrera
On 2021-May-20, Bruce Momjian wrote:

> On Thu, May 20, 2021 at 02:55:18PM -0500, Justin Pryzby wrote:
> > On Thu, May 20, 2021 at 03:44:46PM -0400, Bruce Momjian wrote:
> > > > | Reduce the default value of vacuum_cost_page_miss (Peter Geoghegan) 
> > > > |  This new default better reflects current hardware capabilities. 
> > > > Also say: the previous default was 10.
> > > 
> > > Uh, we didn't report the new value, so why report the old one?
> > 
> > Good point.
> > For symmetry with this one, maybe the old and new values should be included?
> 
> Not sure.  Those values are kind of hard to understand, so I am afraid
> there would be more confusion by mentioning them.
> 
> > |Change checkpoint_completion_target default to 0.9 (Stephen Frost)
> > |The previous default was 0.5.
> 
> Uh, that one is frequently modified by users, to an extent I didn't
> understand why we kept it at 0.5 for so long, which is why I mentioned
> it.

You also mentioned 'md5' in the entry about password_encryption,
remember?  I tend to agree with Justin: if it's not too much extra space
to mention both values, let's just do that.  "Reduce the value of X to Y
from Z.  The new default better reflects ..." seems OK to me.

I prefer "to Y from Z" rather than "from Z to Y", because then the new
value appears first, which seems a tiny improvement in readability,
though the phrase is in the opposite order of traditional.  Also it
seems better than "change value of X to Y.  The previous default was Z"
because it then becomes a little more verbose than really needed.  But
maybe that's OK too.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)




Re: "Multiple table synchronizations are processed serially" still happens

2021-05-20 Thread Guillaume Lelarge
Hi,

Le jeu. 20 mai 2021 à 12:09, Amit Kapila  a écrit :

> On Wed, Apr 28, 2021 at 2:43 PM Guillaume Lelarge
>  wrote:
> >
> > And it logs the "still waiting" message as long as the first table is
> being synchronized. Once this is done, it releases the lock, and the
> synchronization of the second table starts.
> >
> > Is there something I didn't understand on the previous thread?
> >
>
> It seems from a script that you are creating a subscription on the
> same node as publication though in a different DB, right?


Yes, that's right.

If so, the
> problem might be that copying the data of the first table creates a
> transaction which blocks creation of the slot for second table copy.
>

I don't understand how a transaction could block the creation of a slot.
Could you explain that to me? or do you know where this is explained in the
documentation?

The commit you referred will just fix the problem while reading the
> data from the publisher not while writing data in the table in the
> subscriber.
>
> > I'd like to know why serial synchronization happens, and if there's a
> way to avoid it.
> >
>
> I guess you need to create a subscription on a different node.
>
>
Thanks.


-- 
Guillaume.


Re: PG 14 release notes, first draft

2021-05-20 Thread Bruce Momjian
On Thu, May 20, 2021 at 02:55:18PM -0500, Justin Pryzby wrote:
> On Thu, May 20, 2021 at 03:44:46PM -0400, Bruce Momjian wrote:
> > > | Reduce the default value of vacuum_cost_page_miss (Peter Geoghegan) 
> > > |  This new default better reflects current hardware capabilities. 
> > > Also say: the previous default was 10.
> > 
> > Uh, we didn't report the new value, so why report the old one?
> 
> Good point.
> For symmetry with this one, maybe the old and new values should be included?

Not sure.  Those values are kind of hard to understand, so I am afraid
there would be more confusion by mentioning them.

> |Change checkpoint_completion_target default to 0.9 (Stephen Frost)
> |The previous default was 0.5.

Uh, that one is frequently modified by users, to an extent I didn't
understand why we kept it at 0.5 for so long, which is why I mentioned
it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: PG 14 release notes, first draft

2021-05-20 Thread Justin Pryzby
On Thu, May 20, 2021 at 03:44:46PM -0400, Bruce Momjian wrote:
> > | Reduce the default value of vacuum_cost_page_miss (Peter Geoghegan) 
> > |  This new default better reflects current hardware capabilities. 
> > Also say: the previous default was 10.
> 
> Uh, we didn't report the new value, so why report the old one?

Good point.
For symmetry with this one, maybe the old and new values should be included?

|Change checkpoint_completion_target default to 0.9 (Stephen Frost)
|The previous default was 0.5.

-- 
Justin




Postgres perl module namespace

2021-05-20 Thread Andrew Dunstan


While solving a problem with the Beta RPMs, I noticed that they export
our perl test modules as capabilities like this:

[andrew@f34 x86_64]$ rpm -q --provides -p
postgresql14-devel-14-beta1_PGDG.fc34.x86_64.rpm | grep ^perl
perl(PostgresNode)
perl(PostgresVersion)
perl(RecursiveCopy)
perl(SimpleTee)
perl(TestLib)


I don't think we should be putting this stuff in a global namespace like
that. We should invent a namespace that's not likely to conflict with
other people, like, say, 'PostgreSQL::Test' to put these modules. That
would require moving some code around and adjusting a bunch of scripts,
but it would not be difficult. Maybe something to be done post-14?
Meanwhile I would suggest that RPM maintainers exclude both requires and
provides for these five names.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: PG 14 release notes, first draft

2021-05-20 Thread Bruce Momjian
On Wed, May 19, 2021 at 09:39:08AM -0500, Justin Pryzby wrote:
> These sound weird since markup was added in 6a5bde7d4:
> https://www.postgresql.org/docs/devel/release-14.html
> | Remove server and Chapter 34 support for the version 2 wire protocol 
> (Heikki Linnakangas)
> ...
> | Pass doubled quote marks in Chapter 36 SQL command strings literally (Tom 
> Lane)

> -Remove server and libpq support for the version 2 wire protocol (Heikki 
> Linnakangas)
> +Remove server and  support for the version 2  linkend="protocol">wire protocol (Heikki Linnakangas)

Agreed, fixed.

> > Force custom server variable names to match the pattern used for unquoted 
> > SQL identifiers (Tom Lane)
> Say "Require" not force?

Agreed, fixed.

> > This setting was disabled in PostgreSQL version 13.3.
> "disabled" sounds like it was set to "off".  Maybe say it was ignored.

OK, I went with this:

This setting was ignored starting in
PostgreSQL version 13.3.

> > Add long-running queries to be canceled if the client disconnects (Sergey 
> > Cherkashin, Thomas Munro)
> Should say: Allow

Yes.

> > The server variable client_connection_check_interval allows supporting 
> > operating systems, e.g., Linux, to automatically cancel queries by 
> > disconnected clients.
> say "some operating systems" ?

Agreed, done.

> > This can be disabled by turning client option "sslsni" off.
> "turning off"

Agreed.

> | Add %P to log_line_prefix to report the parallel group leader (Justin 
> Pryzby)
> 
> Maybe it should say "Allow %P in log_line_prefix to ...", otherwise it sounds
> like the default was changed.

I am not sure, but I changed it as you suggested:

Allow %P in log_line_prefix to report the
parallel group leader (Justin Pryzby)

> | Reduce the default value of vacuum_cost_page_miss (Peter Geoghegan) 
> |  This new default better reflects current hardware capabilities. 
> Also say: the previous default was 10.

Uh, we didn't report the new value, so why report the old one?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Subscription tests fail under CLOBBER_CACHE_ALWAYS

2021-05-20 Thread Tom Lane
I wrote:
> Michael Paquier  writes:
>> On Wed, May 19, 2021 at 04:23:55PM -0400, Tom Lane wrote:
>>> * Replace the edata->resultRelInfo field with two fields, one for
>>> the original parent and one for the actual/current target.  Perhaps
>>> this is worth doing, not sure.

>> This one sounds more natural to me, though.

> OK, I'll give that a try tomorrow.

Here's a version that does it like that.  I'm not entirely convinced
whether this is better or not.

regards, tom lane

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 1432554d5a..d19269ebce 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -178,6 +178,18 @@ static HTAB *xidhash = NULL;
 /* BufFile handle of the current streaming file */
 static BufFile *stream_fd = NULL;
 
+typedef struct ApplyExecutionData
+{
+	EState	   *estate;			/* executor state, used to track resources */
+	ResultRelInfo *origRelInfo; /* the originally-named target relation */
+	ResultRelInfo *activeRelInfo;	/* the actual target relation */
+	/* activeRelInfo can equal origRelInfo, or be for a partition of it */
+
+	/* These fields are used when the target relation is partitioned: */
+	ModifyTableState *mtstate;	/* dummy ModifyTable state */
+	PartitionTupleRouting *proute;	/* partition routing info */
+} ApplyExecutionData;
+
 typedef struct SubXactInfo
 {
 	TransactionId xid;			/* XID of the subxact */
@@ -226,21 +238,20 @@ static void apply_dispatch(StringInfo s);
 
 static void apply_handle_commit_internal(StringInfo s,
 		 LogicalRepCommitData *commit_data);
-static void apply_handle_insert_internal(ResultRelInfo *relinfo,
-		 EState *estate, TupleTableSlot *remoteslot);
-static void apply_handle_update_internal(ResultRelInfo *relinfo,
-		 EState *estate, TupleTableSlot *remoteslot,
+static void apply_handle_insert_internal(ApplyExecutionData *edata,
+		 TupleTableSlot *remoteslot);
+static void apply_handle_update_internal(ApplyExecutionData *edata,
+		 TupleTableSlot *remoteslot,
 		 LogicalRepTupleData *newtup,
 		 LogicalRepRelMapEntry *relmapentry);
-static void apply_handle_delete_internal(ResultRelInfo *relinfo, EState *estate,
+static void apply_handle_delete_internal(ApplyExecutionData *edata,
 		 TupleTableSlot *remoteslot,
 		 LogicalRepRelation *remoterel);
 static bool FindReplTupleInLocalRel(EState *estate, Relation localrel,
 	LogicalRepRelation *remoterel,
 	TupleTableSlot *remoteslot,
 	TupleTableSlot **localslot);
-static void apply_handle_tuple_routing(ResultRelInfo *relinfo,
-	   EState *estate,
+static void apply_handle_tuple_routing(ApplyExecutionData *edata,
 	   TupleTableSlot *remoteslot,
 	   LogicalRepTupleData *newtup,
 	   LogicalRepRelMapEntry *relmapentry,
@@ -336,20 +347,21 @@ handle_streamed_transaction(LogicalRepMsgType action, StringInfo s)
 
 /*
  * Executor state preparation for evaluation of constraint expressions,
- * indexes and triggers.
+ * indexes and triggers for the specified relation.
  *
- * resultRelInfo is a ResultRelInfo for the relation to be passed to the
- * executor routines.  The caller must open and close any indexes to be
- * updated independently of the relation registered here.
+ * Note that the caller must open and close any indexes to be updated.
  */
-static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel,
-		   ResultRelInfo **resultRelInfo)
+static ApplyExecutionData *
+create_edata_for_relation(LogicalRepRelMapEntry *rel)
 {
+	ApplyExecutionData *edata;
 	EState	   *estate;
 	RangeTblEntry *rte;
+	ResultRelInfo *resultRelInfo;
 
-	estate = CreateExecutorState();
+	edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData));
+
+	edata->estate = estate = CreateExecutorState();
 
 	rte = makeNode(RangeTblEntry);
 	rte->rtekind = RTE_RELATION;
@@ -358,13 +370,16 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel,
 	rte->rellockmode = AccessShareLock;
 	ExecInitRangeTable(estate, list_make1(rte));
 
-	*resultRelInfo = makeNode(ResultRelInfo);
+	edata->origRelInfo = resultRelInfo = makeNode(ResultRelInfo);
+
+	/* Initially, set activeRelInfo to be the named rel */
+	edata->activeRelInfo = resultRelInfo;
 
 	/*
 	 * Use Relation opened by logicalrep_rel_open() instead of opening it
 	 * again.
 	 */
-	InitResultRelInfo(*resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
 
 	/*
 	 * We put the ResultRelInfo in the es_opened_result_relations list, even
@@ -377,29 +392,38 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel,
 	 * an apply operation being responsible for that.
 	 */
 	estate->es_opened_result_relations =
-		lappend(estate->es_opened_result_relations, *resultRelInfo);
+		lappend(estate->es_opened_result_relations, resultRelInfo);
 
 	estate->es_output_cid = 

Re: SSL Tests for sslinfo extension

2021-05-20 Thread Daniel Gustafsson
> On 19 May 2021, at 21:05, Andrew Dunstan  wrote:
> 
> On 5/19/21 1:01 PM, Dagfinn Ilmari Mannsåker wrote:
>> Daniel Gustafsson  writes:
>> 
>>> In order to be able to test extensions with SSL connections, allow
>>> configure_test_server_for_ssl to create any extensions passed as
>>> comma separated list. Each extension is created in all the test
>>> databases which may or may not be useful.
>> Why the comma-separated string, rather than an array reference,
>> i.e. `extensions => [qw(foo bar baz)]`?  

No real reason, I just haven't written Perl enough lately to "think in Perl".
Fixed in the attached.

>> Also, should it use `CREATE
>> EXTENSION .. CASCADE`, in case the specified extensions depend on
>> others?

Good point.  Each extension will have to be in EXTRA_INSTALL as well of course,
but we should to CASCADE.

> Also, instead of one line per db there should be an inner loop over the
> db  names.

Right, I was lazily using the same approach as for CREATE DATABASE but when the
list is used it two places it should be a proper list.  Fixed in the attached.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Extend-configure_test_server_for_ssl-to-add-exten.patch
Description: Binary data


v2-0002-Add-tests-for-sslinfo.patch
Description: Binary data


Re: CALL versus procedures with output-only arguments

2021-05-20 Thread Pavel Stehule
čt 20. 5. 2021 v 19:53 odesílatel Tom Lane  napsal:

> I'm not too happy with this:
>
> regression=# create procedure p1(out x int) language plpgsql
> regression-# as 'begin x := 42; end';
> CREATE PROCEDURE
>
> regression=# call p1();
> ERROR:  procedure p1() does not exist
> LINE 1: call p1();
>  ^
> HINT:  No procedure matches the given name and argument types. You might
> need to add explicit type casts.
>
> regression=# call p1(null);
>  x
> 
>  42
> (1 row)
>
> I can see that that makes some sense within plpgsql, where the CALL
> ought to provide a plpgsql variable for each OUT argument.  But it
> seems moderately insane for calls from SQL.  It certainly fails
> to match the documentation [1], which says fairly explicitly that
> the argument list items match the *input* arguments of the procedure,
> and further notes that plpgsql handles output arguments differently.
>
> I think we ought to fix this so that OUT-only arguments are ignored
> when calling from SQL not plpgsql.  This is less than simple, since
> the parser doesn't actually have any context that would let it know
> which one we're doing, but I think we could hack that up somehow.
> (The RawParseMode mechanism seems like one way we could pass the
> info, and there are probably others.)
>

+1

Pavel


> Alternatively, if we're going to stick with this behavior, we have
> to change the docs to explain it.  Either way it seems like an
> open item for v14.  (For those who've forgotten, OUT-only procedure
> arguments are a new thing in v14.)
>
> regards, tom lane
>
> [1] https://www.postgresql.org/docs/devel/sql-call.html
>
>
>


Re: cutting down the TODO list thread

2021-05-20 Thread Bruce Momjian
On Wed, May 19, 2021 at 01:52:03PM -0400, John Naylor wrote:
> Related to the above, but has a more specific approach in mind. The discussion
> thread is not useful for getting one's head around how to think about the
> problem, much less to decide if it's worth working on -- it's mostly
> complaining about the review process. Independent of that, the idea of
> inspecting the buffer cache seems impractical.

Yes, I think you are right about all of these.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Query about time zone patterns in to_char

2021-05-20 Thread Bruce Momjian
On Thu, May 20, 2021 at 12:21:12PM +0530, Nitin Jadhav wrote:
> Thanks Suraj for reviewing the patch.
> 
> > 1:
> > +RESET timezone;
> > +
> > +
> > CREATE TABLE TIMESTAMPTZ_TST (a int , b timestamptz);
> >
> > Extra line.
> >
> > 2:
> > +SET timezone = '00:00';
> > +SELECT to_char(now(), 'of') as "Of", to_char(now(), 'tzh:tzm') as 
> > "tzh:tzm";
> 
> I have fixed these comments.
> 
> > I am not sure whether we should backport this or not but I don't see any
> issues with back-patching.

Only significant fixes are backpatched, not features.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





CALL versus procedures with output-only arguments

2021-05-20 Thread Tom Lane
I'm not too happy with this:

regression=# create procedure p1(out x int) language plpgsql
regression-# as 'begin x := 42; end';
CREATE PROCEDURE

regression=# call p1();
ERROR:  procedure p1() does not exist
LINE 1: call p1();
 ^
HINT:  No procedure matches the given name and argument types. You might need 
to add explicit type casts.

regression=# call p1(null);
 x  

 42
(1 row)

I can see that that makes some sense within plpgsql, where the CALL
ought to provide a plpgsql variable for each OUT argument.  But it
seems moderately insane for calls from SQL.  It certainly fails
to match the documentation [1], which says fairly explicitly that
the argument list items match the *input* arguments of the procedure,
and further notes that plpgsql handles output arguments differently.

I think we ought to fix this so that OUT-only arguments are ignored
when calling from SQL not plpgsql.  This is less than simple, since
the parser doesn't actually have any context that would let it know
which one we're doing, but I think we could hack that up somehow.
(The RawParseMode mechanism seems like one way we could pass the
info, and there are probably others.)

Alternatively, if we're going to stick with this behavior, we have
to change the docs to explain it.  Either way it seems like an
open item for v14.  (For those who've forgotten, OUT-only procedure
arguments are a new thing in v14.)

regards, tom lane

[1] https://www.postgresql.org/docs/devel/sql-call.html




Re: Race condition in recovery?

2021-05-20 Thread Robert Haas
On Tue, May 18, 2021 at 1:33 AM Dilip Kumar  wrote:
> Yeah, it will be a fake 1-element list.  But just to be clear that
> 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and
> nothing else, do you agree to this?  Because we initialize
> recoveryTargetTLI to this value and we might change it in
> readRecoveryCommandFile() but for that, we must get the history file,
> so if we are talking about the case where we don't have the history
> file then "recoveryTargetTLI" will still be
> "ControlFile->checkPointCopy.ThisTimeLineID".

I don't think your conclusion is correct. As I understand it, you're
talking about the state before
ee994272ca50f70b53074f0febaec97e28f83c4e, because as of now
readRecoveryCommandFile() no longer exists in master. Before that
commit, StartupXLOG did this:

recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
readRecoveryCommandFile();
expectedTLEs = readTimeLineHistory(recoveryTargetTLI);

Now, readRecoveryCommandFile() can change recoveryTargetTLI. Before
doing so, it will call existsTimeLineHistory if
recovery_target_timeline was set to an integer, and findNewestTimeLine
if it was set to latest. In the first case, existsTimeLineHistory()
calls RestoreArchivedFile(), but that restores it using a temporary
name, and KeepFileRestoredFromArchive is not called, so we might have
the timeline history in RECOVERYHISTORY but that's not the filename
we're actually going to try to read from inside readTimeLineHistory().
In the second case, findNewestTimeLine() will call
existsTimeLineHistory() which results in the same situation. So I
think when readRecoveryCommandFile() returns expectedTLI can be
different but the history file can be absent since it was only ever
restored under a temporary name.

> Conclusion:
> - I think now we agree on the point that initializing expectedTLEs
> with the recovery target timeline is the right fix.
> - We still have some differences of opinion about what was the
> original problem in the base code which was fixed by the commit
> (ee994272ca50f70b53074f0febaec97e28f83c4e).

I am also still concerned about whether we understand in exactly what
cases the current logic doesn't work. We seem to more or less agree on
the fix, but I don't think we really understand precisely what case we
are fixing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Function scan FDW pushdown

2021-05-20 Thread Alexander Pyhalov

Hi.

The attached patch allows pushing joins with function RTEs to PostgreSQL 
data sources.

This makes executing queries like this

create foreign table f_pgbench_accounts (aid int, bid int, abalance int, 
filler char(84)) SERVER local_srv OPTIONS (table_name 
'pgbench_accounts');
select * from f_pgbench_accounts join unnest(array[1,2,3]) ON unnest = 
aid;


more efficient.

with patch:

# explain analyze select * from f_pgbench_accounts join 
unnest(array[1,2,3,4,5,6]) ON unnest = aid;

   QUERY PLAN

 Foreign Scan  (cost=100.00..116.95 rows=7 width=356) (actual 
time=2.282..2.287 rows=6 loops=1)

   Relations: (f_pgbench_accounts) INNER JOIN (FUNCTION RTE unnest)
 Planning Time: 0.487 ms
 Execution Time: 3.336 ms

without patch:

# explain analyze select * from f_pgbench_accounts join 
unnest(array[1,2,3,4,5,6]) ON unnest = aid;

  QUERY PLAN
--
 Hash Join  (cost=100.14..158.76 rows=7 width=356) (actual 
time=2.263..1268.607 rows=6 loops=1)

   Hash Cond: (f_pgbench_accounts.aid = unnest.unnest)
   ->  Foreign Scan on f_pgbench_accounts  (cost=100.00..157.74 rows=217 
width=352) (actual time=2.190..1205.938 rows=10 loops=1)
   ->  Hash  (cost=0.06..0.06 rows=6 width=4) (actual time=0.041..0.043 
rows=6 loops=1)

 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Function Scan on unnest  (cost=0.00..0.06 rows=6 width=4) 
(actual time=0.025..0.028 rows=6 loops=1)

 Planning Time: 0.389 ms
 Execution Time: 1269.627 ms

So far I don't know how to visualize actual function expression used in 
function RTE, as in postgresExplainForeignScan() es->rtable comes from  
queryDesc->plannedstmt->rtable, and rte->functions is already 0.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 6b5ea4c62a1fcd3dad586d4f461cb142834ac266 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 17 May 2021 19:19:31 +0300
Subject: [PATCH] Function scan FDW pushdown

---
 contrib/postgres_fdw/deparse.c|  82 --
 .../postgres_fdw/expected/postgres_fdw.out| 123 
 contrib/postgres_fdw/postgres_fdw.c   | 273 --
 contrib/postgres_fdw/sql/postgres_fdw.sql |  60 
 src/backend/optimizer/util/relnode.c  |  58 +++-
 5 files changed, 547 insertions(+), 49 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..292ba52ea14 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1613,13 +1613,36 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 	{
 		RangeTblEntry *rte = planner_rt_fetch(foreignrel->relid, root);
 
-		/*
-		 * Core code already has some lock on each rel being planned, so we
-		 * can use NoLock here.
-		 */
-		Relation	rel = table_open(rte->relid, NoLock);
+		Assert(rte->rtekind == RTE_RELATION || rte->rtekind == RTE_FUNCTION);
+		if (rte->rtekind == RTE_RELATION)
+		{
+			/*
+			 * Core code already has some lock on each rel being planned, so
+			 * we can use NoLock here.
+			 */
+			Relation	rel = table_open(rte->relid, NoLock);
 
-		deparseRelation(buf, rel);
+			deparseRelation(buf, rel);
+
+			table_close(rel, NoLock);
+		}
+		else if (rte->rtekind == RTE_FUNCTION)
+		{
+			RangeTblFunction *rtfunc;
+			deparse_expr_cxt context;
+
+			Assert(list_length(rte->functions) == 1);
+
+			rtfunc = (RangeTblFunction *) linitial(rte->functions);
+
+			context.root = root;
+			context.foreignrel = foreignrel;
+			context.scanrel = foreignrel;
+			context.buf = buf;
+			context.params_list = params_list;
+
+			deparseExpr((Expr *) rtfunc->funcexpr, );
+		}
 
 		/*
 		 * Add a unique alias to avoid any conflict in relation names due to
@@ -1628,8 +1651,6 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 		 */
 		if (use_alias)
 			appendStringInfo(buf, " %s%d", REL_ALIAS_PREFIX, foreignrel->relid);
-
-		table_close(rel, NoLock);
 	}
 }
 
@@ -2309,29 +2330,40 @@ deparseColumnRef(StringInfo buf, int varno, int varattno, RangeTblEntry *rte,
 		 * If it's a column of a foreign table, and it has the column_name FDW
 		 * option, use that value.
 		 */
-		options = GetForeignColumnOptions(rte->relid, varattno);
-		foreach(lc, options)
+		if (rte->rtekind == RTE_RELATION)
 		{
-			DefElem*def = (DefElem *) lfirst(lc);
-
-			if (strcmp(def->defname, "column_name") == 0)
+			options = GetForeignColumnOptions(rte->relid, varattno);
+			foreach(lc, options)
 			{
-colname = defGetString(def);
-break;
+DefElem*def = (DefElem *) lfirst(lc);
+
+if (strcmp(def->defname, "column_name") == 0)
+{
+	colname = defGetString(def);
+	

Re: Added missing tab completion for alter subscription set option

2021-05-20 Thread Tom Lane
vignesh C  writes:
> On Tue, May 18, 2021 at 9:20 PM Alvaro Herrera  
> wrote:
>> I wish we didn't have to keep knowledge in the psql source on which
>> option names are to be used for each command.  If we had some function
>> SELECT pg_completion_options('alter subscription set');
>> that returned the list of options usable for each command, we wouldn't
>> have to ... psql would just retrieve the list of options for the current
>> command.

> On further analysis, I felt that as psql is a front end client, we
> should not put any dependency on backend code. I felt that might be
> the reason it has been coded to mention the options directly  in
> tab-complete instead of having any dependency on backend code.

Well, the problem with Alvaro's proposal is how do you square it
with psql's need to support back versions of the server.  Maybe
you could code tab-complete.c like "if server >= v15 then do X
else do Y", but since Y would be largely duplicative of the
server-side knowledge accessed by X, you haven't really gotten
rid of the two-places-that-know-this issue.  And I'm afraid that
tab-complete.c would become even more of a mess than it is now;
although maybe somebody can see a cute way to avoid that.

regards, tom lane




Re: Added missing tab completion for alter subscription set option

2021-05-20 Thread vignesh C
On Tue, May 18, 2021 at 9:20 PM Alvaro Herrera  wrote:
>
> On 2021-May-14, vignesh C wrote:
>
> > While I was reviewing one of the logical decoding features, I found
> > Streaming and binary options were missing in tab completion for the
> > alter subscription set option, the attached patch has the changes for
> > the same.
> > Thoughts?
>
> I wish we didn't have to keep knowledge in the psql source on which
> option names are to be used for each command.  If we had some function
>  SELECT pg_completion_options('alter subscription set');
> that returned the list of options usable for each command, we wouldn't
> have to ... psql would just retrieve the list of options for the current
> command.
>
> Maintaining such a list does not seem hard -- for example we could just
> have a function alongside parse_subscription_option() that returns the
> names that are recognized by that one.  If we drive the implementation
> of both off a single struct, it would never be outdated.
>

On further analysis, I felt that as psql is a front end client, we
should not put any dependency on backend code. I felt that might be
the reason it has been coded to mention the options directly  in
tab-complete instead of having any dependency on backend code. we
could have the common module to maintain the options and have both
frontend and backend access it or Should we retain the changes like
the earlier patch.
Thoughts?

Regards,
Vignesh




Re: Clarify how triggers relate to transactions

2021-05-20 Thread Laurenz Albe
On Wed, 2021-05-05 at 11:55 +0200, Laurenz Albe wrote:
> On Wed, 2021-04-28 at 13:24 +0200, Laurenz Albe wrote:
> > On Tue, 2021-04-27 at 14:26 +, PG Doc comments form wrote:
> > > https://www.postgresql.org/docs/current/sql-createtrigger.html mentions 
> > > the
> > > word "transaction" only once, in reference specifically to constraint
> > > triggers: "They can be fired either at the end of the statement causing 
> > > the
> > > triggering event, or at the end of the containing transaction; in the 
> > > latter
> > > case they are said to be deferred."
> > > 
> > > If I understand correctly, it would be helpful to add this sentence or a
> > > corrected version of it: "Triggers always execute in the same transaction 
> > > as
> > > the triggering event, and if a trigger fails, the transaction is rolled
> > > back."
> > 
> > Good idea in principle, but I'd put that information on
> > https://www.postgresql.org/docs/current/trigger-definition.html
> 
> Here is a proposed patch for this.

Replying to -hackers for the commitfest app.

Yours,
Laurenz Albe





Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM

2021-05-20 Thread Laurenz Albe
On Wed, 2021-05-05 at 12:03 +0200, Laurenz Albe wrote:
> On Mon, 2021-05-03 at 13:48 -0300, Martín Marqués wrote:
> > We should add a line that indicates that there is a limitation (that
> > should be IMO, backported to documentation of earlier versions as it
> > affects all supported versions), at least until such limitation is
> > lifted.
> 
> Here is a patch for that

Just sending a reply to -hackers so I can add it to the commitfest.

Yours,
Laurenz Albe





Re: Alias collision in `refresh materialized view concurrently`

2021-05-20 Thread Bharath Rupireddy
On Thu, May 20, 2021 at 7:52 PM Bernd Helmle  wrote:
>
> Am Mittwoch, dem 19.05.2021 um 18:06 +0530 schrieb Bharath Rupireddy:
> > > The corresponding Code is in `matview.c` in function
> > > `refresh_by_match_merge`. With adding a prefix like `_pg_internal_`
> > > we
> > > could make collisions pretty unlikely, without intrusive changes.
> > >
> > > The appended patch does this change for the aliases `mv`, `newdata`
> > > and
> > > `newdata2`.
> >
> > I think it's better to have some random name, see below. We could
> > either use "OIDNewHeap" or "MyBackendId" to make those column names
> > unique and almost unguessable. So, something like "pg_temp1_",
> > "pg_temp2_" or "pg_temp3_" and so on would be better IMO.
> >
> > snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u",
> > OIDOldHeap);
> > snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
> > MyBackendId);
>
> Hmm, it's an idea, but this can also lead to pretty random failures if
> you have an unlucky user who had the same idea in its generating query
> tool than the backend, no? Not sure if that's really better.
>
> With the current implementation of REFRESH MATERIALIZED VIEW
> CONCURRENTLY we always have the problem of possible collisions here,
> you'd never get out of this area without analyzing the whole query for
> such collisions.
>
> "mv" looks like a very common alias (i use it all over the time when
> testing or playing around with materialized views, so i'm wondering why
> i didn't see this issue already myself). So the risk here for such a
> collision looks very high. We can try to lower this risk by choosing an
> alias name, which is not so common. With a static alias however you get
> a static error condition, not something that fails here and then.

Another idea is to use random() function to generate required number
of uint32 random values(refresh_by_match_merge might need 3 values to
replace newdata, newdata2 and mv) and use the names like
pg_temp_rmv_<>,  pg_temp_rmv_<> and so on. This
would make the name unguessable. Note that we use this in
choose_dsm_implementation, dsm_impl_posix.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Synchronous commit behavior during network outage

2021-05-20 Thread Ondřej Žižka

On 06/05/2021 06:09, Andrey Borodin wrote:

I could not understand your reasoning about 2 and 4 nodes. Can you please 
clarify a bit how 4 node setup can help prevent visibility of 
commited-locall-but-canceled transactions?

Hello Andrey,

The initial request (for us) was to have a geo cluster with 2 locations 
where would be possible to have 2 sync replicas even in case of failure 
of one location. This means to have 2 nodes in every location (4 
together). If one location fails completely (broken network connection), 
Patroni will choose the working location (5 node etcd in 3 locations to 
ensure this).


In the initial state, there is 1 sync replica in each location and one 
async replica in each location using as a source the sync replica in its 
location.

Let's have the following initial situation:
1) Nodes pg11 and pg12 are in one location nodes pg21 and pg22 are in 
another location.

2) Nodes pg11 and pg21 are in sync replica
3) Node pg12 is an async replica from pg11
4) Node pg22 is an async replica from pg21
5) Master is pg11.

When the commited-locally-but-canceled situation happens and there is a 
problem only with node pg21 (not with the network between nodes), the 
async replica pg12 will receive the local commit from pg11 just after 
the local commit on pg11 even if the cancellation happens. So there will 
be a situation when the commit is present on both pg11 and pg12. If the 
pg11 fails, the transaction already exists on pg12 and this node will be 
selected as a new leader (latest LSN).


There is a period between the time it is committed and the time it will 
have been sent to the async replica when we can lose data, but I expect 
this in milliseconds (maybe less).


It will not prevent visibility but will ensure, that the data would not 
be lost and in that case, data can be visible on the leader even if they 
are not present on the sync replica because there is ensured the 
continuity of the data persistence in the async replica.


I hope I explained it understandably.

Regards
Ondrej





Re: Bug in query rewriter - hasModifyingCTE not getting set

2021-05-20 Thread Tom Lane
"tsunakawa.ta...@fujitsu.com"  writes:
> From: Tom Lane 
>> I think either the bit about rule_action is unnecessary, or most of
>> the code immediately above this is wrong, because it's only updating
>> flags in sub_action.  Why do you think it's necessary to change
>> rule_action in addition to sub_action?

> Finally, I think I've understood what you meant.  Yes, the current code seems 
> to be wrong.

I'm fairly skeptical of this claim, because that code has stood for a
long time.  Can you provide an example (not involving hasModifyingCTE)
in which it's wrong?

regards, tom lane




Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Andrew Dunstan


On 5/20/21 9:53 AM, Tom Lane wrote:
> Michael Paquier  writes:
>> On Thu, May 20, 2021 at 07:05:05AM -0400, Andrew Dunstan wrote:
>>> Your version of perl is apparently too old for this. Looks like that
>>> needs to be 5.22 or later: 
>> Hmm.  src/test/perl/README tells about 5.8.0.  That's quite a jump.
> Something odd about that, because my dinosaurs aren't complaining;
> prairiedog for example uses perl 5.8.3.
>
>   



It was only on Windows that this form of pipe open was not supported.
5.22 fixed that.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





RE: Bug in query rewriter - hasModifyingCTE not getting set

2021-05-20 Thread tsunakawa.ta...@fujitsu.com
From: Tom Lane 
> I think either the bit about rule_action is unnecessary, or most of
> the code immediately above this is wrong, because it's only updating
> flags in sub_action.  Why do you think it's necessary to change
> rule_action in addition to sub_action?

Finally, I think I've understood what you meant.  Yes, the current code seems 
to be wrong.  rule_action is different from sub_action only when the rule 
action (the query specified in CREATE RULE) is INSERT SELECT.  In that case, 
rule_action points to the entire INSERT SELECT, while sub_action points to the 
SELECT part.  So, we should add the CTE list and set 
hasModifyingCTE/hasRecursive flags in rule_action.


> Hm.  So after looking at this more, the problem is that the rewrite
> is producing something equivalent to
> 
> INSERT INTO bug6051_2
> (WITH t1 AS (DELETE FROM bug6051 RETURNING *) SELECT * FROM t1);

Yes.  In this case, the WITH clause must be put before INSERT.

The attached patch is based on your version.  It includes cosmetic changes to 
use = instead of |= for boolean variable assignments.  make check passed.  
Also, Greg-san's original failed test case succeeded.  I confirmed that the 
hasModifyingCTE of the top-level rewritten query is set to true now by looking 
at the output of debug_print_rewritten and debug_print_plan.


Regards
Takayuki Tsunakawa



v3-0001-propagate-CTE-property-flags-in-rewriter.patch
Description: v3-0001-propagate-CTE-property-flags-in-rewriter.patch


Re: Alias collision in `refresh materialized view concurrently`

2021-05-20 Thread Bernd Helmle
Am Mittwoch, dem 19.05.2021 um 18:06 +0530 schrieb Bharath Rupireddy:
> > The corresponding Code is in `matview.c` in function
> > `refresh_by_match_merge`. With adding a prefix like `_pg_internal_`
> > we
> > could make collisions pretty unlikely, without intrusive changes.
> > 
> > The appended patch does this change for the aliases `mv`, `newdata`
> > and
> > `newdata2`.
> 
> I think it's better to have some random name, see below. We could
> either use "OIDNewHeap" or "MyBackendId" to make those column names
> unique and almost unguessable. So, something like "pg_temp1_",
> "pg_temp2_" or "pg_temp3_" and so on would be better IMO.
> 
>     snprintf(NewHeapName, sizeof(NewHeapName), "pg_temp_%u",
> OIDOldHeap);
>     snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d",
> MyBackendId);

Hmm, it's an idea, but this can also lead to pretty random failures if
you have an unlucky user who had the same idea in its generating query
tool than the backend, no? Not sure if that's really better.

With the current implementation of REFRESH MATERIALIZED VIEW
CONCURRENTLY we always have the problem of possible collisions here,
you'd never get out of this area without analyzing the whole query for
such collisions. 

"mv" looks like a very common alias (i use it all over the time when
testing or playing around with materialized views, so i'm wondering why
i didn't see this issue already myself). So the risk here for such a
collision looks very high. We can try to lower this risk by choosing an
alias name, which is not so common. With a static alias however you get
a static error condition, not something that fails here and then.


-- 
Thanks,
Bernd






Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Tom Lane
Michael Paquier  writes:
> On Thu, May 20, 2021 at 07:05:05AM -0400, Andrew Dunstan wrote:
>> Your version of perl is apparently too old for this. Looks like that
>> needs to be 5.22 or later: 

> Hmm.  src/test/perl/README tells about 5.8.0.  That's quite a jump.

Something odd about that, because my dinosaurs aren't complaining;
prairiedog for example uses perl 5.8.3.

regards, tom lane




Re: Installation of regress.so?

2021-05-20 Thread Tom Lane
Andrew Dunstan  writes:
> On 5/19/21 10:24 PM, Tom Lane wrote:
>> Michael Paquier  writes:
>>> Could it be possible to install regress.so at least in the same
>>> location as pg_regress?

>> I don't think this is a great idea. ...

> I do agree that we should not install regress.so by default.

I'd be okay with it being some sort of non-default option.

regards, tom lane




Re: Installation of regress.so?

2021-05-20 Thread Andrew Dunstan


On 5/19/21 10:24 PM, Tom Lane wrote:
> Michael Paquier  writes:
>> Could it be possible to install regress.so at least in the same
>> location as pg_regress?
> I don't think this is a great idea.  Aside from the fact that
> we'd be littering the install tree with a .so of no use to end
> users, I'm failing to see how it really gets you anywhere unless
> you want to further require regress.so from back versions to be
> loadable into the current server.
>
>   



We certainly shouldn't want that.  But we do need it for the target
unless we wipe out everything in the source that refers to it. However,
a given installation can be a source in one test and a target in another
- currently we test upgrade to every live version from every known
version less than or equal to that version (currently crake knows about
versions down to 9.2, but it could easily be taught more).


I do agree that we should not install regress.so by default.


cheers


andrew


-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_rewind fails if there is a read only file.

2021-05-20 Thread Andrew Dunstan


On 5/20/21 6:17 AM, Paul Guo wrote:
>> Presumably the user has a reason for adding the file read-only to the
>> data directory, and we shouldn't lightly ignore that.
>>
>> Michael's advice is reasonable. This seems like a case of:
> I agree. Attached is a short patch to handle the case. The patch was
> tested in my dev environment.



You seem to have missed my point completely. The answer to this problem
IMNSHO is "Don't put a read-only file in the data directory".


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Forget close an open relation in ReorderBufferProcessTXN()

2021-05-20 Thread Amit Langote
On Thu, May 20, 2021 at 5:59 PM osumi.takami...@fujitsu.com
 wrote:
> On Tuesday, May 18, 2021 3:30 PM Amit Langote  wrote:
> > While doing so, it occurred to me (maybe not for the first time) that we are
> > *unnecessarily* doing send_relation_and_attrs() for a relation if the 
> > changes
> > will be published using an ancestor's schema.  In that case, sending only 
> > the
> > ancestor's schema suffices AFAICS.  Changing the code that way doesn't
> > break any tests.  I propose that we fix that too.
> I've analyzed this new change's validity.
> My conclusion for this is that we don't have
> any bad impact from this, which means your additional fix is acceptable.
> I think this addition blurs the purpose of the patch a bit, though.

Okay, I've extracted that change into 0002.

> With the removal of the send_relation_and_attrs() of the patch,
> we don't send one pair of LOGICAL_REP_MSG_TYPE('Y'),
> LOGICAL_REP_MSG_RELATION('R') message to the subscriber
> when we use ancestor. Therefore, we become
> not to register or update type and relation for maybe_send_schema()'s
> argument 'relation' with the patch, in the case to use ancestor's schema.
> However, both the pgoutput_change() and pgoutput_truncate()
> have conditions to check oids to send to the subscriber for any operations.
> Accordingly, the pair information for that argument 'relation'
> aren't used on the subscriber in that case and we are fine.

Thanks for checking that.

Here are updated/divided patches.

--
Amit Langote
EDB: http://www.enterprisedb.com


HEAD-v5-0001-pgoutput-fix-memory-management-of-RelationSyncEnt.patch
Description: Binary data


PG13-v5-0002-pgoutput-don-t-send-leaf-partition-schema-when-pu.patch
Description: Binary data


HEAD-v5-0002-pgoutput-don-t-send-leaf-partition-schema-when-pu.patch
Description: Binary data


PG13-v5-0001-pgoutput-fix-memory-management-for-RelationSyncEn.patch
Description: Binary data


RE: Forget close an open relation in ReorderBufferProcessTXN()

2021-05-20 Thread osumi.takami...@fujitsu.com
On Tuesday, May 18, 2021 3:30 PM  Amit Langote  wrote:
> On Mon, May 17, 2021 at 9:45 PM osumi.takami...@fujitsu.com
>  wrote:
> > On Monday, May 17, 2021 6:52 PM Amit Langote
>  wrote:
> > > Both patches are attached.
> > The patch for PG13 can be applied to it cleanly and the RT succeeded.
> >
> > I have few really minor comments on your comments in the patch.
> >
> > (1) schema_sent's comment
> >
> > @@ -94,7 +94,8 @@ typedef struct RelationSyncEntry
> >
> > /*
> >  * Did we send the schema?  If ancestor relid is set, its schema
> must also
> > -* have been sent for this to be true.
> > +* have been sent and the map to convert the relation's tuples into
> the
> > +* ancestor's format created before this can be set to be true.
> >  */
> > boolschema_sent;
> > List   *streamed_txns;  /* streamed toplevel
> transactions with this
> >
> >
> > I suggest to insert a comma between 'created' and 'before'
> > because the sentence is a bit long and confusing.
> >
> > Or, I thought another comment idea for this part, because the original
> > one doesn't care about the cycle of the reset.
> >
> > "To avoid repetition to send the schema, this is set true after its first
> transmission.
> > Reset when any change of the relation definition is possible. If
> > ancestor relid is set, its schema must have also been sent while the
> > map to convert the relation's tuples into the ancestor's format created,
> before this flag is set true."
> >
> > (2) comment in rel_sync_cache_relation_cb()
> >
> > @@ -1190,13 +1208,25 @@ rel_sync_cache_relation_cb(Datum arg, Oid
> relid)
> >
> > HASH_FIND, NULL);
> >
> > /*
> > -* Reset schema sent status as the relation definition may have
> changed.
> > +* Reset schema sent status as the relation definition may have
> changed,
> > +* also freeing any objects that depended on the earlier definition.
> >
> > How about divide this sentence into two sentences like "Reset schema
> > sent status as the relation definition may have changed.
> > Also, free any objects that depended on the earlier definition."
> 
> Thanks for reading it over.  I have revised comments in a way that hopefully
> addresses your concerns.
Thank you for your fix.
I think the patches look good to me.

Just in case, I'll report that the two patches succeeded
in the RT as expected and from my side,
there's no more suggestions.
Those are ready for committer, I think.

Best Regards,
Takamichi Osumi



Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Andrew Dunstan


On 5/20/21 7:15 AM, Michael Paquier wrote:
> On Thu, May 20, 2021 at 07:05:05AM -0400, Andrew Dunstan wrote:
>> Your version of perl is apparently too old for this. Looks like that
>> needs to be 5.22 or later: 
> Hmm.  src/test/perl/README tells about 5.8.0.  That's quite a jump.



Yes. I've pushed a fix that should take care of the issue.


5.8 is ancient. Yes I know it's what's in the Msys1 DTK, but the DTK
perl seems happy with the list form of pipe open - it's only native
windows perl's that aren't.


Maybe it's time to update the requirement a bit, at least for running
TAP tests.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Freenode woes

2021-05-20 Thread Joe Conway

On 5/19/21 4:27 PM, Robert Treat wrote:

On Wed, May 19, 2021 at 10:19 AM Christoph Berg  wrote:


Fwiw, if the PostgreSQL projects is considering moving the #postgresql
IRC channel(s) elsewhere given [1,2], I'm a member of OFTC.net's network
operations committee and would be happy to help.

[1] https://gist.github.com/aaronmdjones/1a9a93ded5b7d162c3f58bdd66b8f491
[2] https://fuchsnet.ch/freenode-resign-letter.txt



I've been wondering the same thing; given our relationship with SPI,
OFTC seems like an option worthy of consideration.
For those unfamiliar, there is additional info about the network at
https://www.oftc.net



+1 (at least so far for me...)

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Skip partition tuple routing with constant partition key

2021-05-20 Thread Amit Langote
Hou-san,

On Thu, May 20, 2021 at 7:35 PM houzj.f...@fujitsu.com
 wrote:
> From: Amit Langote 
> Sent: Wednesday, May 19, 2021 9:17 PM
> > I guess it would be nice if we could fit in a solution for the use case 
> > that houjz
> > mentioned as a special case.  BTW, houjz, could you please check if a patch 
> > like
> > this one helps the case you mentioned?
>
> Thanks for the patch!
> I did some test on it(using the table you provided above):

Thanks a lot for doing that.

> 1): Test plain column in partition key.
> SQL: insert into foo select 1 from generate_series(1, 1000);
>
> HEAD:
> Time: 5493.392 ms (00:05.493)
>
> AFTER PATCH(skip constant partition key)
> Time: 4198.421 ms (00:04.198)
>
> AFTER PATCH(cache the last partition)
> Time: 4484.492 ms (00:04.484)
>
> The test results of your patch in this case looks good.
> It can fit many more cases and the performance gain is nice.

Hmm yeah, not too bad.

> 2) Test expression in partition key
>
> create or replace function partition_func(i int) returns int as $$
> begin
> return i;
> end;
> $$ language plpgsql immutable parallel restricted;
> create unlogged table foo (a int) partition by range (partition_func(a));
>
> SQL: insert into foo select 1 from generate_series(1, 1000);
>
> HEAD
> Time: 8595.120 ms (00:08.595)
>
> AFTER PATCH(skip constant partition key)
> Time: 4198.421 ms (00:04.198)
>
> AFTER PATCH(cache the last partition)
> Time: 12829.800 ms (00:12.830)
>
> If add a user defined function in the partition key, it seems have
> performance degradation after the patch.

Oops.

> I did some analysis on it, for the above testcase , ExecPartitionCheck
> executed three expression 1) key is null 2) key > low 3) key < top
> In this case, the "key" contains a funcexpr and the funcexpr will be executed
> three times for each row, so, it bring extra overhead which cause the 
> performance degradation.
>
> IMO, improving the ExecPartitionCheck seems a better solution to it, we can
> Calculate the key value in advance and use the value to do the bound check.
> Thoughts ?

This one seems bit tough.  ExecPartitionCheck() uses the generic
expression evaluation machinery like a black box, which means
execPartition.c can't really tweal/control the time spent evaluating
partition constraints.  Given that, we may have to disable the caching
when key->partexprs != NIL, unless we can reasonably do what you are
suggesting.

> Besides, are we going to add a reloption or guc to control this cache 
> behaviour if we more forward with this approach ?
> Because, If most of the rows to be inserted are routing to a different 
> partition each time, then I think the extra ExecPartitionCheck
> will become the overhead. Maybe it's better to apply both two 
> approaches(cache the last partition and skip constant partition key)
> which can achieve the best performance results.

A reloption will have to be a last resort is what I can say about this
at the moment.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Force disable of SSL renegociation in the server

2021-05-20 Thread Daniel Gustafsson
> On 20 May 2021, at 13:00, Michael Paquier  wrote:

> - SSL_OP_NO_RENEGOTIATION controls that.  It is present in OpenSSL >=
> 1.1.1 and has been backported in 1.1.0h (it is not present in older
> versions of 1.1.0).

For OpenSSL 1.1.0 versions < 1.1.0h it will be silently accepted without
actually doing anything, so we might want to combine it with the below.

> - In 1.0.2 and older versions, OpenSSL has an undocumented flag called
> SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS, able to do the same as far as I
> understand.

Well, it's documented in the changelog that it's undocumented (sigh..) along
with a note stating that it works like SSL_OP_NO_RENEGOTIATION.  Skimming the
code it seems to ring true.  For older OpenSSL versions there's also
SSL_OP_ALLOW_UNSAFE_LEGACY_RENEGOTIATION which controls renegotiation for an
older OpenSSL reneg bug.  That applies to 0.9.8 versions which we don't
support, but a malicious user can craft whatever they feel like so maybe we
should ensure it's off as well?

> Could there be a point in backpatching that, in light of what we have done in
> 48d23c72 in the past, though?

I think there is merit to that idea, especially given the precedent.

> Thoughts?

+   /* disallow SSL renegociation, option available since 1.1.0h */
s/renegociation/renegotiation/

+1 on disabling renegotiation, but I think it's worth considering using
SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS as well.  One could also argue that extending
the comment with a note that it only applies to TLSv1.2 and lower could be
helpful to readers who aren't familiar with TLS protocol versions.  TLSv1.3 did
away with renegotiation.

--
Daniel Gustafsson   https://vmware.com/





Diagnostic comment in LogicalIncreaseXminForSlot

2021-05-20 Thread Ashutosh Bapat
Hi
LogicalIncreaseRestartDecodingForSlot() has a debug log to report a
new restart_lsn. But the corresponding function for catalog_xmin.
Here's a patch to add the same.

-- 
Best Wishes,
Ashutosh Bapat
From 5d211016367fe352871647225ead64a2215c3184 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 13 May 2021 13:40:33 +0530
Subject: [PATCH 1/2] Report new catalog_xmin candidate in
 LogicalIncreaseXminForSlot()

Similar to LogicalIncreaseRestartDecodingForSlot() add a debug message
to LogicalIncreaseXminForSlot() reporting a new catalog_xmin candidate.

We should submit this to the community.

BDR-723
---
 src/backend/replication/logical/logical.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index a43df58537..59fd16d1ef 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1562,6 +1562,8 @@ LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 	{
 		slot->candidate_catalog_xmin = xmin;
 		slot->candidate_xmin_lsn = current_lsn;
+		elog(DEBUG1, "got new catalog_xmin %u at %X/%X", xmin,
+			 (uint32) (current_lsn >> 32), (uint32) current_lsn);
 	}
 	SpinLockRelease(>mutex);
 
-- 
2.25.1



Re: Logical Replication - behavior of TRUNCATE ... CASCADE

2021-05-20 Thread Amit Kapila
On Fri, May 7, 2021 at 6:06 PM Dilip Kumar  wrote:
>
> On Mon, May 3, 2021 at 6:08 PM Bharath Rupireddy
>  wrote:
> >
> > Having said that, isn't it good if we can provide a subscription
> > (CREATE/ALTER) level option say "cascade"(similar to other options
> > such as binary, synchronous_commit, stream)  default being false, when
> > set to true, we send upstream CASCADE option to ExecuteTruncateGuts in
> > apply_handle_truncate? It will be useful to truncate all the dependent
> > tables in the subscriber. Users will have to use it with caution
> > though.
>
> I think this could be a useful feature in some cases.  Suppose
> subscriber has some table that is dependent on the subscribed table,
> in such case if the main table gets truncated it will always error out
> in subscriber, which is fine.  But if user doesn’t want error and he
> is fine even if the dependent table gets truncated so I feel there
> should be some option to set that.
>

Such a case is possible in theory but why would the user need it? We
generally recommend having the same schema for relations between
publishers and subscribers, so won't that mean that there is less
chance of such cases? And after we have DDL replication, won't
defining a different schema for replicated objects be difficult to
maintain.

Having said that, I see a different use case of such an option which
is related to the proposal [1] where the patch provides a truncate
option to truncate tables before initial sync. The cascade option
could be useful in that feature to resolve some of the PK-FK issues
raised in that thread.

[1] - 
https://www.postgresql.org/message-id/CF3B6672-2A43-4204-A60A-68F359218A9B%40endpoint.com

-- 
With Regards,
Amit Kapila.




Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Amit Kapila
On Thu, May 20, 2021 at 4:43 PM Andrew Dunstan  wrote:
>
> On 5/20/21 7:05 AM, Andrew Dunstan wrote:
> >>
> > Your version of perl is apparently too old for this. Looks like that
> > needs to be 5.22 or later: 
> >
> >
>
> However, we could probably rewrite this in a way that would work with
> your older perl and at the same time not offend perlcritic, using qx{}
> instead of an explicit open.
>
>
> Will test.
>

Okay, thanks. BTW, our docs don't seem to reflect that we need a newer
version of Perl. See [1] (version 5.8.3 or later is required).

[1] - https://www.postgresql.org/docs/devel/install-windows-full.html

-- 
With Regards,
Amit Kapila.




Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Michael Paquier
On Thu, May 20, 2021 at 07:05:05AM -0400, Andrew Dunstan wrote:
> Your version of perl is apparently too old for this. Looks like that
> needs to be 5.22 or later: 

Hmm.  src/test/perl/README tells about 5.8.0.  That's quite a jump.
--
Michael


signature.asc
Description: PGP signature


Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Andrew Dunstan


On 5/20/21 7:05 AM, Andrew Dunstan wrote:
> On 5/20/21 4:36 AM, Amit Kapila wrote:
>> On Thu, Apr 22, 2021 at 8:43 PM Andrew Dunstan  wrote:
>>> On 4/22/21 2:52 AM, Michael Paquier wrote:
 On Wed, Apr 21, 2021 at 10:04:40AM -0400, Andrew Dunstan wrote:
> Here's a patch with these things attended to.
 Thanks.  Reading through it, that seems pretty much fine to me.  I
 have not spent time checking _version_cmp in details though :)
>>>
>>>
>>> pushed with a couple of fixes.
>>>
>> In my windows environment (Windows 10), I am not able to successfully
>> execute taptests and the failure indicates the line by this commit
>> (4c4eaf3d  Make PostgresNode version aware). I am trying to execute
>> tests with command: vcregress.bat taptest src/test/subscription
>>
>> I am seeing below in the log file:
>> Log file: 
>> D:/WorkSpace/postgresql/src/test/subscription/tmp_check/log/001_rep_changes_publisher.log
>> List form of pipe open not implemented at
>> D:/WorkSpace/postgresql/src/test/perl/PostgresNode.pm line 1251.
>> # Looks like your test exited with 255 before it could output anything.
>>
>> Can you please let me know if I need to do something additional here?
>>
>>
>>
> Your version of perl is apparently too old for this. Looks like that
> needs to be 5.22 or later: 
>
>

However, we could probably rewrite this in a way that would work with
your older perl and at the same time not offend perlcritic, using qx{}
instead of an explicit open.


Will test.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Andrew Dunstan


On 5/20/21 4:36 AM, Amit Kapila wrote:
> On Thu, Apr 22, 2021 at 8:43 PM Andrew Dunstan  wrote:
>> On 4/22/21 2:52 AM, Michael Paquier wrote:
>>> On Wed, Apr 21, 2021 at 10:04:40AM -0400, Andrew Dunstan wrote:
 Here's a patch with these things attended to.
>>> Thanks.  Reading through it, that seems pretty much fine to me.  I
>>> have not spent time checking _version_cmp in details though :)
>>
>>
>>
>> pushed with a couple of fixes.
>>
> In my windows environment (Windows 10), I am not able to successfully
> execute taptests and the failure indicates the line by this commit
> (4c4eaf3d  Make PostgresNode version aware). I am trying to execute
> tests with command: vcregress.bat taptest src/test/subscription
>
> I am seeing below in the log file:
> Log file: 
> D:/WorkSpace/postgresql/src/test/subscription/tmp_check/log/001_rep_changes_publisher.log
> List form of pipe open not implemented at
> D:/WorkSpace/postgresql/src/test/perl/PostgresNode.pm line 1251.
> # Looks like your test exited with 255 before it could output anything.
>
> Can you please let me know if I need to do something additional here?
>
>
>

Your version of perl is apparently too old for this. Looks like that
needs to be 5.22 or later: 


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Force disable of SSL renegociation in the server

2021-05-20 Thread Michael Paquier
Hi all,

Robert has mentioned https://nvd.nist.gov/vuln/detail/CVE-2021-3449 on
the -security list where a TLS server could crash with some crafted
renegociation message.  We already disable SSL renegociation
initialization for some time now, per 48d23c72, but we don't prevent
the server from complying should the client wish to use renegociation.
In terms of robustness and because SSL renegociation had its set of
flaws and issues for many years, it looks like it would be a good idea
to disable renegociation on the backend (not the client as that may be
used with older access points where renegociation is still used, per
an argument from Andres).

In flavor, this is controlled in a way similar to
SSL_OP_NO_COMPRESSION that we already enforce in the backend to
disable SSL compression.  However, there are a couple of compatibility
tweaks regarding this one:
- SSL_OP_NO_RENEGOTIATION controls that.  It is present in OpenSSL >=
1.1.1 and has been backported in 1.1.0h (it is not present in older
versions of 1.1.0).
- In 1.0.2 and older versions, OpenSSL has an undocumented flag called
SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS, able to do the same as far as I
understand.

Attached is a patch to use SSL_OP_NO_RENEGOTIATION if it exists, and
force that in the backend.  We could go further down, but using
undocumented things looks rather unsafe here, to say the least.  Could
there be a point in backpatching that, in light of what we have done in
48d23c72 in the past, though?

Thoughts?
--
Michael
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index c4e8113241..18d116c584 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -251,6 +251,11 @@ be_tls_init(bool isServerStart)
 	/* disallow SSL compression */
 	SSL_CTX_set_options(context, SSL_OP_NO_COMPRESSION);
 
+#ifdef SSL_OP_NO_RENEGOTIATION
+	/* disallow SSL renegociation, option available since 1.1.0h */
+	SSL_CTX_set_options(context, SSL_OP_NO_RENEGOTIATION);
+#endif
+
 	/* set up ephemeral DH and ECDH keys */
 	if (!initialize_dh(context, isServerStart))
 		goto error;


signature.asc
Description: PGP signature


RE: Skip partition tuple routing with constant partition key

2021-05-20 Thread houzj.f...@fujitsu.com


From: Amit Langote 
Sent: Wednesday, May 19, 2021 9:17 PM
> I gave a shot to implementing your idea and ended up with the attached PoC
> patch, which does pass make check-world.
> 
> I do see some speedup:
> 
> -- creates a range-partitioned table with 1000 partitions create unlogged 
> table
> foo (a int) partition by range (a); select 'create unlogged table foo_' || i 
> || '
> partition of foo for values from (' || (i-1)*10+1 || ') to (' || 
> i*10+1 || ');'
> from generate_series(1, 1000) i;
> \gexec
> 
> -- generates a 100 million record file
> copy (select generate_series(1, 1)) to '/tmp/100m.csv' csv;
> 
> Times for loading that file compare as follows:
> 
> HEAD:
> 
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 1
> Time: 31813.964 ms (00:31.814)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 1
> Time: 31972.942 ms (00:31.973)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 1
> Time: 32049.046 ms (00:32.049)
> 
> Patched:
> 
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 1
> Time: 26151.158 ms (00:26.151)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 1
> Time: 28161.082 ms (00:28.161)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 1
> Time: 26700.908 ms (00:26.701)
>
> I guess it would be nice if we could fit in a solution for the use case that 
> houjz
> mentioned as a special case.  BTW, houjz, could you please check if a patch 
> like
> this one helps the case you mentioned?

Thanks for the patch!
I did some test on it(using the table you provided above):

1): Test plain column in partition key.
SQL: insert into foo select 1 from generate_series(1, 1000);

HEAD:
Time: 5493.392 ms (00:05.493)

AFTER PATCH(skip constant partition key)
Time: 4198.421 ms (00:04.198)

AFTER PATCH(cache the last partition)
Time: 4484.492 ms (00:04.484)

The test results of your patch in this case looks good.
It can fit many more cases and the performance gain is nice.

---
2) Test expression in partition key

create or replace function partition_func(i int) returns int as $$
begin
return i;
end;
$$ language plpgsql immutable parallel restricted;
create unlogged table foo (a int) partition by range (partition_func(a));

SQL: insert into foo select 1 from generate_series(1, 1000);

HEAD
Time: 8595.120 ms (00:08.595)

AFTER PATCH(skip constant partition key)
Time: 4198.421 ms (00:04.198)

AFTER PATCH(cache the last partition)
Time: 12829.800 ms (00:12.830)

If add a user defined function in the partition key, it seems have
performance degradation after the patch. 

I did some analysis on it, for the above testcase , ExecPartitionCheck
executed three expression 1) key is null 2) key > low 3) key < top
In this case, the "key" contains a funcexpr and the funcexpr will be executed
three times for each row, so, it bring extra overhead which cause the 
performance degradation.

IMO, improving the ExecPartitionCheck seems a better solution to it, we can
Calculate the key value in advance and use the value to do the bound check.
Thoughts ?



Besides, are we going to add a reloption or guc to control this cache behaviour 
if we more forward with this approach ?
Because, If most of the rows to be inserted are routing to a different 
partition each time, then I think the extra ExecPartitionCheck
will become the overhead. Maybe it's better to apply both two approaches(cache 
the last partition and skip constant partition key)
which can achieve the best performance results.

Best regards,
houzj







Re: Skip partition tuple routing with constant partition key

2021-05-20 Thread David Rowley
On Thu, 20 May 2021 at 20:49, Amit Langote  wrote:
>
> On Thu, May 20, 2021 at 9:31 AM David Rowley  wrote:
> > Wondering what your thoughts are on, instead of caching the last used
> > ResultRelInfo from the last call to ExecFindPartition(), to instead
> > cached the last looked up partition index in PartitionDescData? That
> > way we could cache lookups between statements.  Right now your caching
> > is not going to help for single-row INSERTs, for example.
>
> Hmm, addressing single-row INSERTs with something like you suggest
> might help time-range partitioning setups, because each of those
> INSERTs are likely to be targeting the same partition most of the
> time.  Is that case what you had in mind?

Yeah, I thought it would possibly be useful for RANGE partitioning. I
was a bit undecided with LIST. There seemed to be bigger risk there
that the usage pattern would route to a different partition each time.
In my imagination, RANGE partitioning seems more likely to see
subsequent tuples heading to the same partition as the last tuple.

>  Although, in the cases
> where that doesn't help, we'd end up making a ResultRelInfo for the
> cached partition to check the partition constraint, only then to be
> thrown away because the new row belongs to a different partition.
> That overhead would not be free for sure.

Yeah, there's certainly above zero overhead to getting it wrong. It
would be good to see benchmarks to find out what that overhead is.

David




Re: pg_rewind fails if there is a read only file.

2021-05-20 Thread Paul Guo
> Presumably the user has a reason for adding the file read-only to the
> data directory, and we shouldn't lightly ignore that.
>
> Michael's advice is reasonable. This seems like a case of:

I agree. Attached is a short patch to handle the case. The patch was
tested in my dev environment.


v1-0001-Fix-pg_rewind-failure-due-to-read-only-file-open-.patch
Description: Binary data


Re: "Multiple table synchronizations are processed serially" still happens

2021-05-20 Thread Amit Kapila
On Wed, Apr 28, 2021 at 2:43 PM Guillaume Lelarge
 wrote:
>
> And it logs the "still waiting" message as long as the first table is being 
> synchronized. Once this is done, it releases the lock, and the 
> synchronization of the second table starts.
>
> Is there something I didn't understand on the previous thread?
>

It seems from a script that you are creating a subscription on the
same node as publication though in a different DB, right? If so, the
problem might be that copying the data of the first table creates a
transaction which blocks creation of the slot for second table copy.
The commit you referred will just fix the problem while reading the
data from the publisher not while writing data in the table in the
subscriber.

> I'd like to know why serial synchronization happens, and if there's a way to 
> avoid it.
>

I guess you need to create a subscription on a different node.

-- 
With Regards,
Amit Kapila.




Re: Skip partition tuple routing with constant partition key

2021-05-20 Thread Amit Langote
On Thu, May 20, 2021 at 9:20 AM tsunakawa.ta...@fujitsu.com
 wrote:
> From: Amit Langote 
> > On Tue, May 18, 2021 at 11:11 AM houzj.f...@fujitsu.com
> >  wrote:
> > > For some big data scenario, we sometimes transfer data from one table(only
> > store not expired data)
> > > to another table(historical data) for future analysis.
> > > In this case, we import data into historical table regularly(could be one 
> > > day or
> > half a day),
> > > And the data is likely to be imported with date label specified, then all 
> > > of the
> > data to be
> > > imported this time belong to the same partition which partition by time 
> > > range.
> >
> > Is directing that data directly into the appropriate partition not an
> > acceptable solution to address this particular use case?  Yeah, I know
> > we should avoid encouraging users to perform DML directly on
> > partitions, but...
>
> Yes, I want to make/keep it possible that application developers can be 
> unaware of partitions.  I believe that's why David-san, Alvaro-san, and you 
> have made great efforts to improve partitioning performance.  So, I'm +1 for 
> what Hou-san is trying to achieve.

I'm very glad to see such discussions on the list, because it means
the partitioning feature is being stretched to cover wider set of use
cases.

> Is there something you're concerned about?  The amount and/or complexity of 
> added code?

IMHO, a patch that implements caching more generally would be better
even if it adds some complexity.  Hou-san's patch seemed centered
around the use case where all rows being loaded in a given command
route to the same partition, a very specialized case I'd say.

Maybe we can extract the logic in Hou-san's patch to check the
constant-ness of the targetlist producing the rows to insert and find
a way to add it to the patch I posted such that the generality of the
latter's implementation is not lost.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Slow standby snapshot

2021-05-20 Thread Kirill Reshke
sorry, forgot to add a patch to the letter



чт, 20 мая 2021 г. в 13:52, Кирилл Решке :

> Hi,
> I recently ran into a problem in one of our production postgresql cluster.
> I had noticed lock contention on procarray lock on standby, which causes
> WAL replay lag growth.
> To reproduce this, you can do the following:
>
> 1) set max_connections to big number, like 10
> 2) begin a transaction on primary
> 3) start pgbench workload on primary and on standby
>
> After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in
> perf top consuming abount 75 % of CPU.
>
> %%
>   PerfTop:1060 irqs/sec  kernel: 0.0%  exact:  0.0% [4000Hz cycles:u],
>  (target_pid: 273361)
>
> ---
>
> 73.92%  postgres   [.] KnownAssignedXidsGetAndSetXmin
>  1.40%  postgres   [.] base_yyparse
>  0.96%  postgres   [.] LWLockAttemptLock
>  0.84%  postgres   [.] hash_search_with_hash_value
>  0.84%  postgres   [.] AtEOXact_GUC
>  0.72%  postgres   [.] ResetAllOptions
>  0.70%  postgres   [.] AllocSetAlloc
>  0.60%  postgres   [.] _bt_compare
>  0.55%  postgres   [.] core_yylex
>  0.42%  libc-2.27.so   [.] __strlen_avx2
>  0.23%  postgres   [.] LWLockRelease
>  0.19%  postgres   [.] MemoryContextAllocZeroAligned
>  0.18%  postgres   [.] expression_tree_walker.part.3
>  0.18%  libc-2.27.so   [.] __memmove_avx_unaligned_erms
>  0.17%  postgres   [.] PostgresMain
>  0.17%  postgres   [.] palloc
>  0.17%  libc-2.27.so   [.] _int_malloc
>  0.17%  postgres   [.] set_config_option
>  0.17%  postgres   [.] ScanKeywordLookup
>  0.16%  postgres   [.] _bt_checkpage
>
> %%
>
>
> We have tried to fix this by using BitMapSet instead of boolean array
> KnownAssignedXidsValid, but this does not help too much.
>
> Instead, using a doubly linked list helps a little more, we got +1000 tps
> on pgbench workload with patched postgresql. The general idea of this patch
> is that, instead of memorizing which elements in KnownAssignedXids are
> valid, lets maintain a doubly linked list of them. This  solution will work
> in exactly the same way, except that taking a snapshot on the replica is
> now O(running transaction) instead of O(head - tail) which is significantly
> faster under some workloads. The patch helps to reduce CPU usage of
> KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate
> it from perf top.
>
> The problem is better reproduced on PG13 since PG14 has some snapshot
> optimization.
>
> Thanks!
>
> Best regards, reshke
>
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 5ff8cab394..165cf56ea9 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -255,9 +255,18 @@ static PGPROC *allProcs;
  * Bookkeeping for tracking emulated transactions in recovery
  */
 static TransactionId *KnownAssignedXids;
-static bool *KnownAssignedXidsValid;
+
+typedef struct {
+	int prv;
+	int nxt;
+} KnownAssignedXidValidNode;
+
+// Doubly linked list of valid xids
+static KnownAssignedXidValidNode *KnownAssignedXidsValidDLL;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
+
+
 /*
  * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
  * the highest xid that might still be running that we don't have in
@@ -348,6 +357,9 @@ static void GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons);
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
  */
+
+static KnownAssignedXidValidNode KAX_E_INVALID;
+
 Size
 ProcArrayShmemSize(void)
 {
@@ -380,13 +392,20 @@ ProcArrayShmemSize(void)
 		size = add_size(size,
 		mul_size(sizeof(TransactionId),
  TOTAL_MAX_CACHED_SUBXIDS));
-		size = add_size(size,
-		mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS));
+size = add_size(size,
+		mul_size(sizeof(KnownAssignedXidValidNode), TOTAL_MAX_CACHED_SUBXIDS));
 	}
 
+KAX_E_INVALID.prv = -1;
+KAX_E_INVALID.nxt = TOTAL_MAX_CACHED_SUBXIDS;
+
 	return size;
 }
 
+#define KAX_DLL_INDEX_VALID(i) (-1 < i && i < TOTAL_MAX_CACHED_SUBXIDS)
+#define KAX_DLL_ENTRY_INVALID(i) (-1 == KnownAssignedXidsValidDLL[i].prv && KnownAssignedXidsValidDLL[i].nxt == TOTAL_MAX_CACHED_SUBXIDS)
+
+
 /*
  * Initialize the shared PGPROC array during postmaster startup.
  */
@@ -431,10 +450,15 @@ CreateSharedProcArray(void)
 			mul_size(sizeof(TransactionId),
 	 TOTAL_MAX_CACHED_SUBXIDS),
 			);
-		KnownAssignedXidsValid = (bool *)
-			ShmemInitStruct("KnownAssignedXidsValid",
-			mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
+		
+KnownAssignedXidsValidDLL = (KnownAssignedXidValidNode*)
+			ShmemInitStruct("KnownAssignedXidsValidDLL",
+			mul_size(sizeof(KnownAssignedXidValidNode), TOTAL_MAX_CACHED_SUBXIDS),
 			);
+
+			for 

Re: Inaccurate error message when set fdw batch_size to 0

2021-05-20 Thread Bharath Rupireddy
On Thu, May 20, 2021 at 1:44 PM Kyotaro Horiguchi
 wrote:
>
> At Wed, 19 May 2021 21:48:56 +0530, Bharath Rupireddy 
>  wrote in
> > On Wed, May 19, 2021 at 5:20 PM Fujii Masao  
> > wrote:
> > > I'm fine to convert "non-negative" word to "greater than" or "greater than
> > > or equal to" in the messages. But this change also seems to get rid of
> > > the information about the data type of the option from the message.
> > > I'm not sure if this is an improvement. Probably isn't it better to
> > > convert "requires a non-negative integer value" to "must be an integer 
> > > value
> > > greater than zero"?
> >
> > Thanks for the comments. Done that way. PSA v3 patch.
>
> --- a/src/backend/utils/adt/tsquery_op.c
> +++ b/src/backend/utils/adt/tsquery_op.c
> @@ -121,7 +121,7 @@ tsquery_phrase_distance(PG_FUNCTION_ARGS)
> if (distance < 0 || distance > MAXENTRYPOS)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -errmsg("distance in phrase operator should 
> be non-negative and less than %d",
> +errmsg("distance in phrase operator must be 
> an integer value greater than or equal to zero and less than %d",
> MAXENTRYPOS)));
>
> Though it is not due to this patch, but the message looks wrong. The 
> condition is suggesting:
>
> "distance in phrase operator must be an integer value greater than or equal 
> to zero and less than or equal to %d"
>
> I'm not sure readers can read it without biting their tongue.  How
> about something like the following instead?
>
> "distance in phrase operator must be an integer value between zero and
>  %d inclusive."

Thanks. That looks better. PSA v4 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 3dea08a9f6a0a39ea48ea2db67791d46db7997a9 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 20 May 2021 14:23:03 +0530
Subject: [PATCH v4] Disambiguate error messages that use "non-negative"

Using "non-negative" word in the error message looks ambiguous
since value 0 can't be considered as either a positive or negative
integer and some users might think "positive" is the same as
"non-negative". So, be clear with the following messages:
if (foo <= 0) then use "foo must be greater than zero" and
if (foo < 0) then use "foo must be greater than or equal to zero"

Also, added a note in the Error Message Style Guide to help writing
the new messages.

Authors: Hou Zhijie, Bharath Rupireddy.
---
 contrib/postgres_fdw/option.c   |  8 
 doc/src/sgml/sources.sgml   | 14 ++
 src/backend/access/transam/parallel.c   |  1 -
 src/backend/partitioning/partbounds.c   |  4 ++--
 src/backend/utils/adt/tsquery_op.c  |  2 +-
 src/bin/scripts/vacuumdb.c  |  2 +-
 src/test/modules/test_shm_mq/test.c | 10 +-
 src/test/regress/expected/hash_part.out |  4 ++--
 8 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 672b55a808..a0d771a16b 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -118,7 +118,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "fdw_startup_cost") == 0 ||
  strcmp(def->defname, "fdw_tuple_cost") == 0)
 		{
-			/* these must have a non-negative numeric value */
+			/* these must have a positive numeric value */
 			double		val;
 			char	   *endp;
 
@@ -126,7 +126,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			if (*endp || val < 0)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("%s requires a non-negative numeric value",
+		 errmsg("\"%s\" must be a numeric value greater than or equal to zero",
 def->defname)));
 		}
 		else if (strcmp(def->defname, "extensions") == 0)
@@ -142,7 +142,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			if (fetch_size <= 0)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("%s requires a non-negative integer value",
+		 errmsg("\"%s\" must be an integer value greater than zero",
 def->defname)));
 		}
 		else if (strcmp(def->defname, "batch_size") == 0)
@@ -153,7 +153,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			if (batch_size <= 0)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("%s requires a non-negative integer value",
+		 errmsg("\"%s\" must be an integer value greater than zero",
 def->defname)));
 		}
 		else if (strcmp(def->defname, "password_required") == 0)
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index 3f2c40b750..b9c632a349 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -880,6 +880,20 @@ BETTER: unrecognized node type: 42

   
 
+  
+   Avoid Using non-negative Word in Error Messages
+
+   
+Do not use non-negative word in error messages 

Re: Adaptive Plan Sharing for PreparedStmt

2021-05-20 Thread Tomas Vondra

Hi,

On 5/20/21 5:43 AM, Andy Fan wrote:

Currently we are using a custom/generic strategy to handle the data skew
issue. However, it doesn't work well all the time. For example:  SELECT *
FROM t WHERE a between $1 and $2. We assume the selectivity is 0.0025,
But users may provide a large range every time. Per our current strategy,
a generic plan will be chosen, Index scan on A will be chosen. oops..



Yeah, the current logic is rather simple, which is however somewhat on 
purpose, as it makes the planning very cheap. But it also means there's 
very little info to check/compare and so we may make mistakes.



I think Oracle's Adaptive Cursor sharing should work. First It calculate
the selectivity with the real bind values and generate/reuse different plan
based on the similarity of selectivity. The challenges I can think of 
now are:
a). How to define the similarity.  b). How to adjust the similarity 
during the
real run. for example, we say [1% ~ 10%] is similar. but we find 
selectivity 20%

used the same plan as 10%. what should be done here.



IMO the big question is how expensive this would be. Calculating the 
selectivities for real values (i.e. for each query) is not expensive, 
but it's not free either. So even if we compare the selectivities in 
some way and skip the actual query planning, it's still going to impact 
the prepared statements.


Also, we currently don't have any mechanism to extract the selectivities 
from the whole query - not sure how complex that would be, as it may 
involve e.g. join selectivities.



As for how to define the similarity, I doubt there's a simple and 
sensible/reliable way to do that :-(


I remember reading a paper about query planning in which the parameter 
space was divided into regions with the same plan. In this case the 
parameters are selectivities for all the query operations. So what we 
might do is this:


1) Run the first N queries and extract the selectivities / plans.

2) Build "clusters" of selecitivies with the same plan.

3) Before running a query, see if it the selectivities fall into one of 
the existing clusters. If yes, use the plan. If not, do regular 
planning, add it to the data set and repeat (2).


I have no idea how expensive would this be, and I assume the "clusters" 
may have fairly complicated shapes (not simple convex regions).



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: Forget close an open relation in ReorderBufferProcessTXN()

2021-05-20 Thread osumi.takami...@fujitsu.com
On Tuesday, May 18, 2021 3:30 PM Amit Langote  wrote:
> While doing so, it occurred to me (maybe not for the first time) that we are
> *unnecessarily* doing send_relation_and_attrs() for a relation if the changes
> will be published using an ancestor's schema.  In that case, sending only the
> ancestor's schema suffices AFAICS.  Changing the code that way doesn't
> break any tests.  I propose that we fix that too.
I've analyzed this new change's validity.
My conclusion for this is that we don't have
any bad impact from this, which means your additional fix is acceptable.
I think this addition blurs the purpose of the patch a bit, though.

With the removal of the send_relation_and_attrs() of the patch,
we don't send one pair of LOGICAL_REP_MSG_TYPE('Y'),
LOGICAL_REP_MSG_RELATION('R') message to the subscriber
when we use ancestor. Therefore, we become
not to register or update type and relation for maybe_send_schema()'s
argument 'relation' with the patch, in the case to use ancestor's schema.
However, both the pgoutput_change() and pgoutput_truncate()
have conditions to check oids to send to the subscriber for any operations.
Accordingly, the pair information for that argument 'relation'
aren't used on the subscriber in that case and we are fine.

I'll comment on other minor things in another email.


Best Regards,
Takamichi Osumi



Slow standby snapshot

2021-05-20 Thread Кирилл Решке
Hi,
I recently ran into a problem in one of our production postgresql cluster.
I had noticed lock contention on procarray lock on standby, which causes
WAL replay lag growth.
To reproduce this, you can do the following:

1) set max_connections to big number, like 10
2) begin a transaction on primary
3) start pgbench workload on primary and on standby

After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in
perf top consuming abount 75 % of CPU.

%%
  PerfTop:1060 irqs/sec  kernel: 0.0%  exact:  0.0% [4000Hz cycles:u],
 (target_pid: 273361)
---

73.92%  postgres   [.] KnownAssignedXidsGetAndSetXmin
 1.40%  postgres   [.] base_yyparse
 0.96%  postgres   [.] LWLockAttemptLock
 0.84%  postgres   [.] hash_search_with_hash_value
 0.84%  postgres   [.] AtEOXact_GUC
 0.72%  postgres   [.] ResetAllOptions
 0.70%  postgres   [.] AllocSetAlloc
 0.60%  postgres   [.] _bt_compare
 0.55%  postgres   [.] core_yylex
 0.42%  libc-2.27.so   [.] __strlen_avx2
 0.23%  postgres   [.] LWLockRelease
 0.19%  postgres   [.] MemoryContextAllocZeroAligned
 0.18%  postgres   [.] expression_tree_walker.part.3
 0.18%  libc-2.27.so   [.] __memmove_avx_unaligned_erms
 0.17%  postgres   [.] PostgresMain
 0.17%  postgres   [.] palloc
 0.17%  libc-2.27.so   [.] _int_malloc
 0.17%  postgres   [.] set_config_option
 0.17%  postgres   [.] ScanKeywordLookup
 0.16%  postgres   [.] _bt_checkpage

%%


We have tried to fix this by using BitMapSet instead of boolean array
KnownAssignedXidsValid, but this does not help too much.

Instead, using a doubly linked list helps a little more, we got +1000 tps
on pgbench workload with patched postgresql. The general idea of this patch
is that, instead of memorizing which elements in KnownAssignedXids are
valid, lets maintain a doubly linked list of them. This  solution will work
in exactly the same way, except that taking a snapshot on the replica is
now O(running transaction) instead of O(head - tail) which is significantly
faster under some workloads. The patch helps to reduce CPU usage of
KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate
it from perf top.

The problem is better reproduced on PG13 since PG14 has some snapshot
optimization.

Thanks!

Best regards, reshke


Re: Skip partition tuple routing with constant partition key

2021-05-20 Thread Amit Langote
On Thu, May 20, 2021 at 9:31 AM David Rowley  wrote:
> On Thu, 20 May 2021 at 01:17, Amit Langote  wrote:
> > I gave a shot to implementing your idea and ended up with the attached
> > PoC patch, which does pass make check-world.
>
> I only had a quick look at this.
>
> + if ((dispatch->key->strategy == PARTITION_STRATEGY_RANGE ||
> + dispatch->key->strategy == PARTITION_STRATEGY_RANGE))
> + dispatch->lastPartInfo = rri;
>
> I think you must have meant to have one of these as PARTITION_STRATEGY_LIST?

Oops, of course.  Fixed in the attached.

> Wondering what your thoughts are on, instead of caching the last used
> ResultRelInfo from the last call to ExecFindPartition(), to instead
> cached the last looked up partition index in PartitionDescData? That
> way we could cache lookups between statements.  Right now your caching
> is not going to help for single-row INSERTs, for example.

Hmm, addressing single-row INSERTs with something like you suggest
might help time-range partitioning setups, because each of those
INSERTs are likely to be targeting the same partition most of the
time.  Is that case what you had in mind?  Although, in the cases
where that doesn't help, we'd end up making a ResultRelInfo for the
cached partition to check the partition constraint, only then to be
thrown away because the new row belongs to a different partition.
That overhead would not be free for sure.

> For multi-level partition hierarchies that would still require looping
> and checking the cached value at each level.

Yeah, there's no getting around that, though maybe that's not a big problem.

> I've not studied the code that builds and rebuilds PartitionDescData,
> so there may be some reason that we shouldn't do that. I know that's
> changed a bit recently with DETACH CONCURRENTLY.  However, providing
> the cached index is not outside the bounds of the oids array, it
> shouldn't really matter if the cached value happens to end up pointing
> to some other partition. If that happens, we'll just fail the
> ExecPartitionCheck() and have to look for the correct partition.

Yeah, as long as ExecFindPartition performs ExecPartitionCheck() on
before returning a given cached partition, there's no need to worry
about the cached index getting stale for whatever reason.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


ExecFindPartition-cache-partition-PoC_v2.patch
Description: Binary data


Re: multi-install PostgresNode fails with older postgres versions

2021-05-20 Thread Amit Kapila
On Thu, Apr 22, 2021 at 8:43 PM Andrew Dunstan  wrote:
>
> On 4/22/21 2:52 AM, Michael Paquier wrote:
> > On Wed, Apr 21, 2021 at 10:04:40AM -0400, Andrew Dunstan wrote:
> >> Here's a patch with these things attended to.
> > Thanks.  Reading through it, that seems pretty much fine to me.  I
> > have not spent time checking _version_cmp in details though :)
>
>
>
>
> pushed with a couple of fixes.
>

In my windows environment (Windows 10), I am not able to successfully
execute taptests and the failure indicates the line by this commit
(4c4eaf3d  Make PostgresNode version aware). I am trying to execute
tests with command: vcregress.bat taptest src/test/subscription

I am seeing below in the log file:
Log file: 
D:/WorkSpace/postgresql/src/test/subscription/tmp_check/log/001_rep_changes_publisher.log
List form of pipe open not implemented at
D:/WorkSpace/postgresql/src/test/perl/PostgresNode.pm line 1251.
# Looks like your test exited with 255 before it could output anything.

Can you please let me know if I need to do something additional here?



--
With Regards,
Amit Kapila.




Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.

2021-05-20 Thread Dave Page
On Tue, Mar 30, 2021 at 6:58 AM Tom Lane  wrote:

> Thomas Munro  writes:
> > I'll move it when committing.  I'll let this patch sit for another day
> > to see if any other objections show up.
>
> FWIW, I remain fairly strongly against this, precisely because of the
> point that it requires us to start using a randomly different
> feature-probing technology anytime Apple decides that they're going to
> implement some standard API that they didn't before.  Even if it works
> everywhere for preadv/pwritev (which we won't know in advance of
> buildfarm testing, and maybe not then, since detection failures will
> probably be silent), it seems likely that we'll hit some case in the
> future where this interacts badly with some other platform's weirdness.
> We haven't claimed in the past to support MACOSX_DEPLOYMENT_TARGET,
> and I'm not sure we should start now.  How many people actually care
> about that?
>

I missed this earlier - it's come to my attention through a thread on the
-packagers list. Adding my response on that thread here for this audience:

The ability to target older releases with a newer SDK is essential for
packages such as the EDB PostgreSQL installers and the pgAdmin community
installers. It's very difficult (sometimes impossible) to get older OS
versions on new machines now - Apple make it very hard to download old
versions of macOS (some can be found, others not), and they won't always
work on newer hardware anyway so it's really not feasible to have all the
build machines running the oldest version that needs to be supported.

FYI, the pgAdmin and PG installer buildfarms have
-mmacosx-version-min=10.12 in CFLAGS etc. to handle this, which is
synonymous with MACOSX_DEPLOYMENT_TARGET. We've been successfully building
packages that way for a decade or more.

-- 
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com


Re: Inaccurate error message when set fdw batch_size to 0

2021-05-20 Thread Kyotaro Horiguchi
At Wed, 19 May 2021 21:48:56 +0530, Bharath Rupireddy 
 wrote in 
> On Wed, May 19, 2021 at 5:20 PM Fujii Masao  
> wrote:
> > I'm fine to convert "non-negative" word to "greater than" or "greater than
> > or equal to" in the messages. But this change also seems to get rid of
> > the information about the data type of the option from the message.
> > I'm not sure if this is an improvement. Probably isn't it better to
> > convert "requires a non-negative integer value" to "must be an integer value
> > greater than zero"?
> 
> Thanks for the comments. Done that way. PSA v3 patch.

--- a/src/backend/utils/adt/tsquery_op.c
+++ b/src/backend/utils/adt/tsquery_op.c
@@ -121,7 +121,7 @@ tsquery_phrase_distance(PG_FUNCTION_ARGS)
if (distance < 0 || distance > MAXENTRYPOS)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-errmsg("distance in phrase operator should be 
non-negative and less than %d",
+errmsg("distance in phrase operator must be an 
integer value greater than or equal to zero and less than %d",
MAXENTRYPOS)));

Though it is not due to this patch, but the message looks wrong. The condition 
is suggesting:

"distance in phrase operator must be an integer value greater than or equal to 
zero and less than or equal to %d"

I'm not sure readers can read it without biting their tongue.  How
about something like the following instead?

"distance in phrase operator must be an integer value between zero and
 %d inclusive."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: "ERROR: deadlock detected" when replicating TRUNCATE

2021-05-20 Thread tanghy.f...@fujitsu.com
On Thursday, May 20, 2021 3:05 PM, Amit Kapila  wrote:
> Okay, I have prepared the patches for all branches (11...HEAD). Each
> version needs minor changes in the test, the code doesn't need much
> change. Some notable changes in the tests:
> 1. I have removed the conf change for max_logical_replication_workers
> on the publisher node. We only need this for the subscriber node.
> 2. After creating the new subscriptions wait for initial
> synchronization as we do for other tests.
> 3. synchronous_standby_names need to be reset for the previous test.
> This is only required for HEAD.
> 4. In PG-11, we need to specify the application_name in the connection
> string, otherwise, it took the testcase file name as application_name.
> This is the same as other tests are doing in PG11.
> 
> Can you please once verify the attached patches?

I have tested your patches for all branches(11...HEAD). All of them passed. 
B.T.W, I also confirmed that the bug exists in these branches without your fix.

The changes in tests LGTM. 
But I saw whitespace warnings when applied the patches for PG11 and PG12, 
please take a look at this.

Regards
Tang


Re: Query about time zone patterns in to_char

2021-05-20 Thread Nitin Jadhav
Thanks Suraj for reviewing the patch.

> 1:
> +RESET timezone;
> +
> +
> CREATE TABLE TIMESTAMPTZ_TST (a int , b timestamptz);
>
> Extra line.
>
> 2:
> +SET timezone = '00:00';
> +SELECT to_char(now(), 'of') as "Of", to_char(now(), 'tzh:tzm') as
"tzh:tzm";

I have fixed these comments.

> I am not sure whether we should backport this or not but I don't see any
issues with back-patching.

I am also not sure about this. If it is really required, I would like to
create those patches.

Please find the patch attached. Kindly confirm and share comments if any.

--
Thanks & Regards,
Nitin Jadhav



On Thu, May 20, 2021 at 8:55 AM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> +1 for the change.
>
> I quickly reviewed the patch and overall it looks good to me.
> Few cosmetic suggestions:
>
> 1:
> +RESET timezone;
> +
> +
>  CREATE TABLE TIMESTAMPTZ_TST (a int , b timestamptz);
>
> Extra line.
>
> 2:
> +SET timezone = '00:00';
> +SELECT to_char(now(), 'of') as "Of", to_char(now(), 'tzh:tzm') as
> "tzh:tzm";
>
> O should be small in alias just for consistency.
>
> I am not sure whether we should backport this or not but I don't see any
> issues with back-patching.
>
> On Sun, May 16, 2021 at 9:43 PM Nitin Jadhav <
> nitinjadhavpostg...@gmail.com> wrote:
>
>> > AFAICS, table 9.26 specifically shows which case-variants are supported.
>> > If there are some others that happen to work, we probably shouldn't
>> > remove them for fear of breaking poorly-written apps ... but that does
>> > not imply that we need to support every case-variant.
>>
>> Thanks for the explanation. I also feel that we may not support every
>> case-variant. But the other reason which triggered me to think in the
>> other way is, as mentioned in commit [1] where this feature was added,
>> says that these format patterns are compatible with Oracle. Whereas
>> Oracle supports both upper case and lower case patterns. I just wanted
>> to get it confirmed with this point before concluding.
>>
>> [1] -
>> commit 11b623dd0a2c385719ebbbdd42dd4ec395dcdc9d
>> Author: Andrew Dunstan 
>> Date:   Tue Jan 9 14:25:05 2018 -0500
>>
>> Implement TZH and TZM timestamp format patterns
>>
>> These are compatible with Oracle and required for the datetime
>> template
>> language for jsonpath in an upcoming patch.
>>
>> Nikita Glukhov and Andrew Dunstan, reviewed by Pavel Stehule.
>>
>> Thanks & Regards,
>> Nitin Jadhav
>>
>> On Sun, May 16, 2021 at 8:40 PM Tom Lane  wrote:
>> >
>> > Nitin Jadhav  writes:
>> > > While understanding the behaviour of the to_char() function as
>> > > explained in [1], I observed that some patterns related to time zones
>> > > do not display values if we mention in lower case. As shown in the
>> > > sample output [2], time zone related patterns TZH, TZM and OF outputs
>> > > proper values when specified in upper case but does not work if we
>> > > mention in lower case. But other patterns like TZ, HH, etc works fine
>> > > with upper case as well as lower case.
>> >
>> > > I would like to know whether the current behaviour of TZH, TZM and OF
>> > > is done intentionally and is as expected.
>> >
>> > AFAICS, table 9.26 specifically shows which case-variants are supported.
>> > If there are some others that happen to work, we probably shouldn't
>> > remove them for fear of breaking poorly-written apps ... but that does
>> > not imply that we need to support every case-variant.
>> >
>> > regards, tom lane
>>
>>
>>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
>
>
>
> edbpostgres.com
>


v2_support_of_tzh_tzm_patterns.patch
Description: Binary data


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-20 Thread Greg Nancarrow
On Thu, May 20, 2021 at 11:18 AM Pengchengliu  wrote:
>
> Hi Greg,
>Thanks a lot for you explanation and your fix.
>
>I think your fix can resolve the core dump issue. As with your fix, 
> parallel process reset Transaction Xmin from ActiveSnapshot.
>But it will change Transaction snapshot for all parallel  scenarios. I 
> don't know whether it bring in other issue.
>For test only, I think it is enough.
>
>So is there anybody can explain what's exactly difference between 
> ActiveSnapshot and TransactionSnapshot in parallel work process.
>Then maybe we can find a better solution and try to fix it really.
>

I am proposing the attached patch to fix this issue (patches for both
PG13.2 and latest PG14 source are provided).

Perhaps this will trigger others who better know the intricacies of
snapshot handling, and know the reasons and history behind why the
transaction_snapshot and active_snapshot are currently passed
separately to parallel workers, to speak up here and discuss the issue
further, and check my fix (and note, I haven't attempted to modify
README.parallel in the patch until I get further clarification on this
issue).
Perhaps someone can explain the purpose of calling
GetTransactionSnapshot() AGAIN in InitializeParallelDSM() and how is
this consistent with the current ActiveSnapshot?
AFAICS, that doesn't seem correct, and that's why the patch removes it.

I've rebuilt Postgres with the patch applied and run the regression
tests, with and without "force_parallel_mode=regress", and all tests
pass.
So if the fix is wrong, no tests currently exist that detect issues with it.

Regards,
Greg Nancarrow
Fujitsu Australia


v1-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


v1-0001-PG13_2-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-05-20 Thread Michael Paquier
On Tue, May 18, 2021 at 10:49:39AM +0900, Michael Paquier wrote:
> Makes sense.  For now, I'll update this patch set so as it is possible
> to use custom dumps, as an option in parallel of pg_regress when
> specifying a different source code path.  I'll also decouple the
> business with probin updates and stick with the approach used by the
> buildfarm code.

This has proved to not be that difficult.  With the updated version
attached, pg_upgrade has two modes to set up the old instance used for
the upgrade with older binaries:
- With oldsrc and oldinstall set, pg_regress gets used, same way as
HEAD.
- With olddump and oldinstall set, an old dump is loaded instead in
the old instance before launching the upgrade.

oldsrc and olddump are exclusive options.  Similarly to HEAD, the
dumps taken from the old and new instances generate diffs that can be
inspected manually.  The updates of probin are done without any
dependencies to the source path of the old instance, copying from the
buildfarm.

While on it, I have fixed a couple of things that exist in test.sh but
were not reflected in this new script:
- Handling of postfix operations with ~13 clusters.
- Handling oldstyle_length for ~9.6 clusters.
- Handling of EXTRA_REGRESS_OPT.

This stuff still needs to be expanded depending on how PostgresNode is
made backward-compatible, but I'll wait for that to happen before
going further down here.  I have also spent some time testing all that
with MSVC, and the installation paths used for pg_regress make the
script a tad more confusing, so I have dropped this part for now.

Thanks,
--
Michael
From bfb0086cfc35c063acc671896a7ffcb2c23dc2c2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 20 May 2021 15:04:55 +0900
Subject: [PATCH v2] Switch tests of pg_upgrade to use TAP

---
 src/bin/pg_upgrade/Makefile|  17 +-
 src/bin/pg_upgrade/TESTING |  29 ++-
 src/bin/pg_upgrade/t/001_basic.pl  |   9 +
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 266 
 src/bin/pg_upgrade/test.sh | 272 -
 src/test/perl/PostgresNode.pm  |  25 +++
 6 files changed, 323 insertions(+), 295 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/001_basic.pl
 create mode 100644 src/bin/pg_upgrade/t/002_pg_upgrade.pl
 delete mode 100644 src/bin/pg_upgrade/test.sh

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 44d06be5a6..fa8dee0a9c 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -49,17 +49,8 @@ clean distclean maintainer-clean:
 	   pg_upgrade_dump_globals.sql \
 	   pg_upgrade_dump_*.custom pg_upgrade_*.log
 
-# When $(MAKE) is present, make automatically infers that this is a
-# recursive make. which is not actually what we want here, as that
-# e.g. prevents output synchronization from working (as make thinks
-# that the subsidiary make knows how to deal with that itself, but
-# we're invoking a shell script that doesn't know). Referencing
-# $(MAKE) indirectly avoids that behaviour.
-# See https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html#MAKE-Variable
-NOTSUBMAKEMAKE=$(MAKE)
+check:
+	$(prove_check)
 
-check: test.sh all temp-install
-	MAKE=$(NOTSUBMAKEMAKE) $(with_temp_install) bindir=$(abs_top_builddir)/tmp_install/$(bindir) EXTRA_REGRESS_OPTS="$(EXTRA_REGRESS_OPTS)" $(SHELL) $<
-
-# installcheck is not supported because there's no meaningful way to test
-# pg_upgrade against a single already-running server
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_upgrade/TESTING b/src/bin/pg_upgrade/TESTING
index e69874b42d..185943dd4b 100644
--- a/src/bin/pg_upgrade/TESTING
+++ b/src/bin/pg_upgrade/TESTING
@@ -4,19 +4,26 @@ THE SHORT VERSION
 On non-Windows machines, you can execute the testing process
 described below by running
 	make check
-in this directory.  This will run the shell script test.sh, performing
-an upgrade from the version in this source tree to a new instance of
-the same version.
+in this directory.  This will run the TAP tests to run pg_upgrade,
+performing an upgrade from the version in this source tree to a
+new instance of the same version.
 
-To test an upgrade from a different version, you must have a built
-source tree for the old version as well as this version, and you
-must have done "make install" for both versions.  Then do:
+To test an upgrade from a different version, there are two options
+available:
+
+1) You have a built source tree for the old version as well as this
+version's binaries.  Then set up the following variables:
 
 export oldsrc=...somewhere/postgresql	(old version's source tree)
-export oldbindir=...otherversion/bin	(old version's installed bin dir)
-export bindir=...thisversion/bin	(this version's installed bin dir)
-export libdir=...thisversion/lib	(this version's installed lib dir)
-sh test.sh
+export oldinstall=...otherversion/bin	(old version's install base path)
+
+2) You have a dump that can 

Re: "ERROR: deadlock detected" when replicating TRUNCATE

2021-05-20 Thread Amit Kapila
On Tue, May 18, 2021 at 6:52 AM Peter Smith  wrote:
>
> >
> > Yeah, have you checked it in the back branches?
> >
>
> Yes, the apply_handle_truncate function was introduced in April/2018 [1].
>
> REL_11_0 was tagged in Oct/2018.
>
> The "ERROR:  deadlock detected" log is reproducible in PG 11.0.
>

Okay, I have prepared the patches for all branches (11...HEAD). Each
version needs minor changes in the test, the code doesn't need much
change. Some notable changes in the tests:
1. I have removed the conf change for max_logical_replication_workers
on the publisher node. We only need this for the subscriber node.
2. After creating the new subscriptions wait for initial
synchronization as we do for other tests.
3. synchronous_standby_names need to be reset for the previous test.
This is only required for HEAD.
4. In PG-11, we need to specify the application_name in the connection
string, otherwise, it took the testcase file name as application_name.
This is the same as other tests are doing in PG11.

Can you please once verify the attached patches?

-- 
With Regards,
Amit Kapila.


v4-0001-Fix-deadlock-for-multiple-replicating-truncates-11.patch
Description: Binary data


v4-0001-Fix-deadlock-for-multiple-replicating-truncates-12.patch
Description: Binary data


v4-0001-Fix-deadlock-for-multiple-replicating-truncates-13.patch
Description: Binary data


v4-0001-Fix-deadlock-for-multiple-replicating-truncates-HEAD.patch
Description: Binary data