RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2021-01-06 Thread Shinya11.Kato
>Thanks for review, and sorry for reply so later.
>
>>I reviewed the patch and found some problems.
>>>+ if(startSegNo != endSegNo)
>>>+ else if(record->ReadRecPtr / XLOG_BLCKSZ !=
>>>+ if(rmid == RM_XLOG_ID && info == XLOG_SWITCH)
>>>+ if(ri == RM_XLOG_ID)
>>>+ if(info == XLOG_SWITCH)
>>You need to put a space after the "if".
>All fix and thanks for point the issue.
>
>>>@@ -24,6 +24,7 @@
>>>#include "common/logging.h"
>>>#include "getopt_long.h"
>>>#include "rmgrdesc.h"
>>>+#include "catalog/pg_control.h"
>>I think the include statements should be arranged in alphabetical order.
>Fix.

Thank you for your revision!

>>>+ info = (rj << 4) & ~XLR_INFO_MASK;
>>>+ if(info == XLOG_SWITCH)
>>>+ XLogDumpStatsRow(psprintf("XLOG/SWITCH_JUNK"),
>>>+ 0, total_count, stats->junk_size, total_rec_len,
>>>+ 0, total_fpi_len, stats->junk_size, total_len);
>
>>Can't be described in the same way as "XLogDumpStatsRow(psprintf("%s/%s", 
>>desc->rm_name, id)..."?
>>Only this part looks strange.
>>Why are the "count" and "fpi_len" fields 0?
>The 'SWITCH_JUNK' is not a real record and it relys on 'XLOG_SWITCH' record, 
>so I think we can't count
>'SWITCH_JUNK', so the "count" is 0. And it never contain FPI, so the "fpi_len" 
>is 0.
>
>But 0 value maybe looks strange, so in current version I show it like below:
>Type N (%) Record size (%) FPI size (%) Combined size (%)
> - --- --- ---  --- - ---
>...
>XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
>Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)
>

I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.

Regards,
Shinya Kato



Re: Single transaction in the tablesync worker?

2021-01-06 Thread Peter Smith
Hi Amit.

PSA the v12 patch for the Tablesync Solution1.

Differences from v11:
  + Added PG docs to mention the tablesync slot
  + Refactored tablesync slot drop (done by
DropSubscription/process_syncing_tables_for_sync)
  + Fixed PG docs mentioning wrong state code
  + Fixed wrong code comment describing TCOPYDONE state



Features:

* The tablesync slot is now permanent instead of temporary. The
tablesync slot name is no longer tied to the Subscription slot na

* The tablesync slot cleanup (drop) code is added for DropSubscription
and for process_syncing_tables_for_sync functions

* The tablesync worker is now allowing multiple tx instead of single tx

* A new state (SUBREL_STATE_TCOPYDONE) is persisted after a successful
copy_table in LogicalRepSyncTableStart.

* If a re-launched tablesync finds state SUBREL_STATE_TCOPYDONE then
it will bypass the initial copy_table phase.

* Now tablesync sets up replication origin tracking in
LogicalRepSyncTableStart (similar as done for the apply worker). The
origin is advanced when first created.

* The tablesync replication origin tracking is cleaned up during
DropSubscription and/or process_syncing_tables_for_apply.

* The DropSubscription cleanup code was enhanced (v7+) to take care of
any crashed tablesync workers.

* Updates to PG docs

TODO / Known Issues:

* Address review comments

* Patch applies with whitespace warning

---

Kind Regards,
Peter Smith.
Fujitsu Australia


v12-0001-Tablesync-Solution1.patch
Description: Binary data


v12-0002-Tablesync-extra-logging.patch
Description: Binary data


Re: CheckpointLock needed in CreateCheckPoint()?

2021-01-06 Thread Amul Sul
On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy
 wrote:
>
> On Thu, Jan 7, 2021 at 11:32 AM Amul Sul  wrote:
> > Snip from CreateCheckPoint():
> > --
> > /*
> >  * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
> >  * (This is just pro forma, since in the present system structure there is
> >  * only one process that is allowed to issue checkpoints at any given
> >  * time.)
> >  */
> > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
> > --
> >
> > As per this comment, it seems to be that we don't really need this LW lock. 
> > We
> > could have something else instead if we are afraid of having multiple
> > checkpoints at any given time which isn't possible, btw.
>
> Looks like CheckpointLock() can also be called in standalone backends
> (RequestCheckpoint->CreateCheckPoint) along with the checkpointer
> process. Having said that, I'm not quite sure whether it can get
> called at the same time from a backend(may be with checkpoint;
> command) and the checkpointer process.
>
> /*
>  * If in a standalone backend, just do it ourselves.
>  */
> if (!IsPostmasterEnvironment)
> {
> /*
>  * There's no point in doing slow checkpoints in a standalone backend,
>  * because there's no other backends the checkpoint could disrupt.
>  */
> CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE);
>
> See the below comment for IsPostmasterEnvironment:
>
> /*
>  * IsPostmasterEnvironment is true in a postmaster process and any postmaster
>  * child process; it is false in a standalone process (bootstrap or
>  * standalone backend).
>

I am not sure I understood your point completely.  Do you mean there could be
multiple bootstrap or standalone backends that could exist at a time and can
perform checkpoint?

> > This CheckpointLock is acquired almost at the start of CreateCheckPoint() 
> > and
> > released at the end. The in-between code can take a while to be complete. 
> > All
> > that time interrupt will be on hold which doesn't seem to be a great idea 
> > but
> > that wasn't problematic until now.  I am having trouble due to this 
> > interrupt
> > hold for a long since I was trying to add code changes where the 
> > checkpointer
> > process is suppose to respond to the system read-write state change request 
> > as
> > soon as possible[1]. That cannot be done if the interrupt is disabled
> > since read-write state change uses the global barrier mechanism to convey 
> > system
> > state changes to other running processes.
> >
> > So my question is, can we completely get rid of the CheckpointLock need in
> > CreateCheckPoint()?
> >
> > Thoughts/Comments/Suggestions?
>
> I don't think we can completely get rid of CheckpointLock in
> CreateCheckPoint given the fact that it can get called from multiple
> processes.
>

How?

> How about serving that ASRO barrier request alone for checkpointer
> process in ProcessInterrupts even though the CheckpointLock is held by
> the checkpointer process? And also while the checkpointing is
> happening, may be we should call CHECK_FOR_INTERRUPTS to see if the
> ASRO barrier has arrived. This may sound bit hacky, but just a
> thought. I'm thinking that in ProcessInterrupts we can get to know the
> checkpointer process type via MyAuxProcType, and whether the lock is
> held or not using CheckpointLock, and then we can handle the ASRO
> global barrier request.
>

I am afraid that this will easily get rejected by the community.  Note that this
is a very crucial code path of the database server. There are other options
too, but don't want to propose them until clarity on the CheckpointLock
necessity.

Regards,
Amul




Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-06 Thread Heikki Linnakangas

On 07/01/2021 06:15, Michael Paquier wrote:

On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote:

contrib/pgcrypto/internal-sha2.c and
src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create()
is missing check for NULL result. With your latest patch, that's OK because
the subsequent pg_cryptohash_update() calls will fail if passed a NULL
context. But seems sloppy.


Is it?  pg_cryptohash_create() will never return NULL for the backend.


Ah, you're right.


src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication.
If the pg_cryptohash functions fail, we throw a distinct error "could not
encode salt" that reveals that it was a mock authentication. I don't think
this is a big deal in practice, it would be hard for an attacker to induce
the SHA256 computation to fail, and there are probably better ways to
distinguish mock authentication from real, like timing attacks. But still.


This maps with the second error in the mock routine, so wouldn't it be
better to change both and back-patch?  The last place where this error
message is used is pg_be_scram_build_secret() for the generation of
what's stored in pg_authid.  An idea would be to use "out of memory".
That can be faced for any palloc() calls.


Hmm. Perhaps it would be best to change all the errors in mock 
authentication to COMMERROR. It'd be nice to have an accurate error 
message in the log, but no need to send it to the client.



src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
still need separate fields for the different sha variants.


Using separate fields looked cleaner to me if it came to debugging,
and the cleanup was rather minimal except if we use more switch/case
to set up the various variables.  What about something like the
attached?


+1

- Heikki




Re: CheckpointLock needed in CreateCheckPoint()?

2021-01-06 Thread Dilip Kumar
On Thu, Jan 7, 2021 at 12:45 PM Bharath Rupireddy
 wrote:
>
> On Thu, Jan 7, 2021 at 11:32 AM Amul Sul  wrote:
> > Snip from CreateCheckPoint():
> > --
> > /*
> >  * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
> >  * (This is just pro forma, since in the present system structure there is
> >  * only one process that is allowed to issue checkpoints at any given
> >  * time.)
> >  */
> > LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
> > --
> >
> > As per this comment, it seems to be that we don't really need this LW lock. 
> > We
> > could have something else instead if we are afraid of having multiple
> > checkpoints at any given time which isn't possible, btw.
>
> Looks like CheckpointLock() can also be called in standalone backends
> (RequestCheckpoint->CreateCheckPoint) along with the checkpointer
> process. Having said that, I'm not quite sure whether it can get
> called at the same time from a backend(may be with checkpoint;
> command) and the checkpointer process.

If it is a standalone backend then there will be no postmaster and no
other process i.e. no checkpoint process also.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: A failure of standby to follow timeline switch

2021-01-06 Thread Kyotaro Horiguchi
At Thu, 7 Jan 2021 11:55:33 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/01/05 17:26, Kyotaro Horiguchi wrote:
> > At Mon, 4 Jan 2021 19:00:21 +0900, Fujii Masao 
> > wrote in
> >>
> >>
> >> On 2021/01/04 12:06, Kyotaro Horiguchi wrote:
> >>> At Sat, 26 Dec 2020 02:15:06 +0900, Fujii Masao
> >>>  wrote in
> 
>  On 2020/12/25 12:03, Kyotaro Horiguchi wrote:
> >>> The attached is the fixed version.
> >>
> >> Thanks for updating the patches!
> >>
> >>> In the first patch, the test added to 001_stream_rep.pl involves two
> >>> copied functions related to server-log investigation from
> >>> 019_repslot_limit.pl.
> >>
> >> So you're planning to define them commonly in TestLib.pm or elsewhere?
> > Yeah.. That's correct. Newly added as the first patch.
> > While making that change, I extended the interface of slurp_file to
> > allow reading from arbitrary position.
> 
> Is this extension really helpful for current use case?
> At least I'd like to avoid back-patching this since it's an exntesion...

Yeah, I felt a hesitattion about it a bit. It's less useful assuming
that log files won't get so large. Removed in this version.

>   OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r')
> or croak "could not read \"$filename\": $^E\n";
> + seek($fh, $from, 0)
> +   or croak "could not seek \"$filename\" to $from: $^E\n";
> 
> I'm not familiar with this area, but SetFilePointer() is more suitable
> rather than seek()?

SetFilePointer() works for a native handle, IO::Handle->new()
here. seek() works on $fh, a perl handle.  If ReadFile is used later
SetFilePointer() might be needed separately.

Anyway, it is removed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 39f284c2109b63e444eb8d437fe51ae3268d305b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 5 Jan 2021 13:34:36 +0900
Subject: [PATCH v4 1/3] Move TAP log-searching feature to common modules

Some private functions in 019_repslot_limit.pl will be used in other
tests. Move them to common modules so that they are available to the
new tests.
---
 src/test/perl/PostgresNode.pm | 36 ++
 src/test/recovery/t/019_replslot_limit.pl | 37 ---
 2 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..d78e9f93fb 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2223,6 +2223,42 @@ sub pg_recvlogical_upto
 
 =pod
 
+=item $node->current_log_position()
+
+Return the current position of server log.
+
+=cut
+
+sub current_log_position
+{
+	my $self = shift;
+
+	return (stat $self->logfile)[7];
+}
+
+=pod
+
+=item $node->find_in_log($pattern, $startpos)
+
+Returns whether the $pattern occurs after $startpos in the server log.
+
+=cut
+
+sub find_in_log
+{
+	my ($self, $pattern, $startpos) = @_;
+
+	$startpos = 0 unless defined $startpos;
+	my $log = TestLib::slurp_file($self->logfile);
+	return 0 if (length($log) <= $startpos);
+
+	$log = substr($log, $startpos);
+
+	return $log =~ m/$pattern/;
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 20f4a1bbc3..b298d9992f 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -165,19 +165,17 @@ $node_primary->wait_for_catchup($node_standby, 'replay', $start_lsn);
 
 $node_standby->stop;
 
-ok( !find_in_log(
-		$node_standby,
-		"requested WAL segment [0-9A-F]+ has already been removed"),
+ok( !$node_standby->find_in_log(
+		 "requested WAL segment [0-9A-F]+ has already been removed"),
 	'check that required WAL segments are still available');
 
 # Advance WAL again, the slot loses the oldest segment.
-my $logstart = get_log_size($node_primary);
+my $logstart = $node_primary->current_log_position();
 advance_wal($node_primary, 7);
 $node_primary->safe_psql('postgres', "CHECKPOINT;");
 
 # WARNING should be issued
-ok( find_in_log(
-		$node_primary,
+ok( $node_primary->find_in_log(
 		"invalidating slot \"rep1\" because its restart_lsn [0-9A-F/]+ exceeds max_slot_wal_keep_size",
 		$logstart),
 	'check that the warning is logged');
@@ -190,14 +188,13 @@ is($result, "rep1|f|t|lost|",
 	'check that the slot became inactive and the state "lost" persists');
 
 # The standby no longer can connect to the primary
-$logstart = get_log_size($node_standby);
+$logstart = $node_standby->current_log_position();
 $node_standby->start;
 
 my $failed = 0;
 for (my $i = 0; $i < 1; $i++)
 {
-	if (find_in_log(
-			$node_standby,
+	if ($node_standby->find_in_log(
 			"requested WAL segment [0-9A-F]+ has already been removed",
 			$logstart))
 	{
@@ -264,25 +261,3 @@ sub advance_wal
 	}
 	return;
 }
-
-# return the size of logfile of $node in bytes
-sub get_log_size
-{
-	my ($node) = @_;
-
-	return (stat $node->logfile)[7];
-}
-
-# find $pat in logfile of $node 

Re: CheckpointLock needed in CreateCheckPoint()?

2021-01-06 Thread Bharath Rupireddy
On Thu, Jan 7, 2021 at 11:32 AM Amul Sul  wrote:
> Snip from CreateCheckPoint():
> --
> /*
>  * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
>  * (This is just pro forma, since in the present system structure there is
>  * only one process that is allowed to issue checkpoints at any given
>  * time.)
>  */
> LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
> --
>
> As per this comment, it seems to be that we don't really need this LW lock. We
> could have something else instead if we are afraid of having multiple
> checkpoints at any given time which isn't possible, btw.

Looks like CheckpointLock() can also be called in standalone backends
(RequestCheckpoint->CreateCheckPoint) along with the checkpointer
process. Having said that, I'm not quite sure whether it can get
called at the same time from a backend(may be with checkpoint;
command) and the checkpointer process.

/*
 * If in a standalone backend, just do it ourselves.
 */
if (!IsPostmasterEnvironment)
{
/*
 * There's no point in doing slow checkpoints in a standalone backend,
 * because there's no other backends the checkpoint could disrupt.
 */
CreateCheckPoint(flags | CHECKPOINT_IMMEDIATE);

See the below comment for IsPostmasterEnvironment:

/*
 * IsPostmasterEnvironment is true in a postmaster process and any postmaster
 * child process; it is false in a standalone process (bootstrap or
 * standalone backend).

> This CheckpointLock is acquired almost at the start of CreateCheckPoint() and
> released at the end. The in-between code can take a while to be complete. All
> that time interrupt will be on hold which doesn't seem to be a great idea but
> that wasn't problematic until now.  I am having trouble due to this interrupt
> hold for a long since I was trying to add code changes where the checkpointer
> process is suppose to respond to the system read-write state change request as
> soon as possible[1]. That cannot be done if the interrupt is disabled
> since read-write state change uses the global barrier mechanism to convey 
> system
> state changes to other running processes.
>
> So my question is, can we completely get rid of the CheckpointLock need in
> CreateCheckPoint()?
>
> Thoughts/Comments/Suggestions?

I don't think we can completely get rid of CheckpointLock in
CreateCheckPoint given the fact that it can get called from multiple
processes.

How about serving that ASRO barrier request alone for checkpointer
process in ProcessInterrupts even though the CheckpointLock is held by
the checkpointer process? And also while the checkpointing is
happening, may be we should call CHECK_FOR_INTERRUPTS to see if the
ASRO barrier has arrived. This may sound bit hacky, but just a
thought. I'm thinking that in ProcessInterrupts we can get to know the
checkpointer process type via MyAuxProcType, and whether the lock is
held or not using CheckpointLock, and then we can handle the ASRO
global barrier request.

I may be missing something.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-06 Thread Masahiko Sawada
On Thu, Jan 7, 2021 at 1:30 PM Fujii Masao  wrote:
>
>
>
> On 2021/01/07 12:42, Masahiko Sawada wrote:
> > On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2021/01/07 10:01, Masahiko Sawada wrote:
> >>> On Wed, Jan 6, 2021 at 3:37 PM  wrote:
> 
> > +#define Query_for_list_of_cursors \
> > +" SELECT name FROM pg_cursors"\
> >
> > This query should be the following?
> >
> > " SELECT pg_catalog.quote_ident(name) "\
> > "   FROM pg_catalog.pg_cursors "\
> > "  WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"
> >
> > +/* CLOSE */
> > +  else if (Matches("CLOSE"))
> > +  COMPLETE_WITH_QUERY(Query_for_list_of_cursors
> > +  " UNION ALL 
> > SELECT 'ALL'");
> >
> > "UNION ALL" should be "UNION"?
> 
>  Thank you for your advice, and I corrected them.
> 
> >> +   COMPLETE_WITH_QUERY(Query_for_list_of_cursors
> >> +   " UNION SELECT 
> >> 'ABSOLUTE'"
> >> +   " UNION SELECT 
> >> 'BACKWARD'"
> >> +   " UNION SELECT 
> >> 'FORWARD'"
> >> +   " UNION SELECT 
> >> 'RELATIVE'"
> >> +   " UNION SELECT 
> >> 'ALL'"
> >> +   " UNION SELECT 
> >> 'NEXT'"
> >> +   " UNION SELECT 
> >> 'PRIOR'"
> >> +   " UNION SELECT 
> >> 'FIRST'"
> >> +   " UNION SELECT 
> >> 'LAST'"
> >> +   " UNION SELECT 
> >> 'FROM'"
> >> +   " UNION SELECT 
> >> 'IN'");
> >>
> >> This change makes psql unexpectedly output "FROM" and "IN" just after 
> >> "FETCH".
> >
> > I think "FROM" and "IN" can be placed just after "FETCH". According to
> > the documentation, the direction can be empty. For instance, we can do
> > like:
> 
>  Thank you!
> 
> > I've attached the patch improving the tab completion for DECLARE as an
> > example. What do you think?
> >
> > BTW according to the documentation, the options of DECLARE statement
> > (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
> >
> > DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
> >  CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
> >
> > But I realized that these options are actually order-insensitive. For
> > instance, we can declare a cursor like:
> >
> > =# declare abc scroll binary cursor for select * from pg_class;
> > DECLARE CURSOR
> >
> > The both parser code and documentation has been unchanged from 2003.
> > Is it a documentation bug?
> 
>  Thank you for your patch, and it is good.
>  I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO 
>  SCROLL) are order-sensitive."
>  I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any 
>  order." , according to the documentation.
> >>>
> >>> Thanks, you're right. I missed that sentence. I still don't think the
> >>> syntax of DECLARE statement in the documentation doesn't match its
> >>> implementation but I agree that it's order-insensitive.
> >>>
>  I made a new patch, but the amount of codes was large due to 
>  order-insensitive.
> >>>
> >>> Thank you for updating the patch!
> >>>
> >>> Yeah, I'm also afraid a bit that conditions will exponentially
> >>> increase when a new option is added to DECLARE statement in the
> >>> future. Looking at the parser code for DECLARE statement, we can put
> >>> the same options multiple times (that's also why I don't think it
> >>> matches). For instance,
> >>>
> >>> postgres(1:44758)=# begin;
> >>> BEGIN
> >>> postgres(1:44758)=# declare test binary binary binary cursor for
> >>> select * from pg_class;
> >>> DECLARE CURSOR
> >>>
> >>> So how about simplify the above code as follows?
> >>>
> >>> @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int 
> >>> end)
> >>>   else if (Matches("DECLARE", MatchAny))
> >>>   COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
> >>> "CURSOR");
> >>> +   /*
> >>> +* Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
> >>> +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for
> >>> +* DECLARE, assume we want options.
> >>> +*/
> >>> +   else if (HeadMatches("DECLARE", MatchAny, "*") &&
> >>> +

Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Amit Kapila
On Thu, Jan 7, 2021 at 3:32 AM Peter Geoghegan  wrote:
>
> On Wed, Jan 6, 2021 at 1:29 PM Michael Banck  
> wrote:
> > That one seems to be 5min everywhere, and one can change it except on
> > Azure.
>
> Okay, thanks for clearing that up. Looks like all of the big 3 cloud
> providers use Postgres checksums in a straightforward way.
>

But they might have done something to reduce the impact of enabling
checksums like by using a different checksum (for data blocks) and or
compression (for WAL) technique.

> I don't have much more to say on this thread. I am -1 on the current
> proposal to enable page-level checksums by default.
>

-1 from me too with the current impact on performance and WAL it can
have. I was looking at some old thread related to this topic and came
across the benchmarking done by Tomas Vondra [1]. It clearly shows
that enabling checksums can have a huge impact on init time, WAL, and
TPS.

Having said that, if we really want to enable checksums, can't we
think of improving performance when it is enabled? I could think of
two things to improve (a) a better algorithm for wal compression
(currently we use pglz), this will allow us to enable wal_compression
at least when data_checksums are enabled (b) a better technique for
checksums to reduce the cost of PageSetChecksumCopy. I don't have good
ideas to offer to improve things in these two areas but I think it is
worth investigating if we want to enable checksums.

[1] - https://www.postgresql.org/message-id/20190330192543.GH4719%40development

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Identify LWLocks in tracepoints

2021-01-06 Thread Craig Ringer
On Mon, 28 Dec 2020 at 20:09, Masahiko Sawada  wrote:

> Hi Craig,
>
> On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
>  wrote:
> >
> > Hi all
> >
> > The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
> >
> > This does not provide complete information on blockers, because it's not
> necessarily valid to compare any two LWLock* pointers between two process
> address spaces. The locks could be in DSM segments, and those DSM segments
> could be mapped at different addresses.
> >
> > I wasn't able to work out a sensible way to map a LWLock* to any sort of
> (tranche-id, lock-index) because there's no requirement that locks in a
> tranche be contiguous or known individually to the lmgr.
> >
> > Despite that, the patches improve the information available for LWLock
> analysis significantly.
> >
> > Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
> fired from LWLockWaitForVar, despite that function never actually acquiring
> the lock.
> >
> > Patch 2 adds the tranche id and lock pointer for each trace hit. This
> makes it possible to differentiate between individual locks within a
> tranche, and (so long as they aren't tranches in a DSM segment) compare
> locks between processes. That means you can do lock-order analysis etc,
> which was not previously especially feasible. Traces also don't have to do
> userspace reads for the tranche name all the time, so the trace can run
> with lower overhead.
> >
> > Patch 3 adds a single-path tracepoint for all lock acquires and
> releases, so you only have to probe the lwlock__acquired and
> lwlock__release events to see all acquires/releases, whether conditional or
> otherwise. It also adds start markers that can be used for timing wallclock
> duration of LWLock acquires/releases.
> >
> > Patch 4 adds some comments on LWLock tranches to try to address some
> points I found confusing and hard to understand when investigating this
> topic.
> >
>
> You sent in your patch to pgsql-hackers on Dec 19, but you did not
> post it to the next CommitFest[1].  If this was intentional, then you
> need to take no action.  However, if you want your patch to be
> reviewed as part of the upcoming CommitFest, then you need to add it
> yourself before 2021-01-01 AoE[2]. Thanks for your contributions.
>
> Regards,
>
> [1] https://commitfest.postgresql.org/31/
> [2] https://en.wikipedia.org/wiki/Anywhere_on_Earth
>

Thanks.

CF entry created at https://commitfest.postgresql.org/32/2927/ . I don't
think it's urgent and will have limited review time so I didn't try to
wedge it into the current CF.


CheckpointLock needed in CreateCheckPoint()?

2021-01-06 Thread Amul Sul
Hi ALL,

Snip from CreateCheckPoint():
--
/*
 * Acquire CheckpointLock to ensure only one checkpoint happens at a time.
 * (This is just pro forma, since in the present system structure there is
 * only one process that is allowed to issue checkpoints at any given
 * time.)
 */
LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
--

As per this comment, it seems to be that we don't really need this LW lock. We
could have something else instead if we are afraid of having multiple
checkpoints at any given time which isn't possible, btw.

This CheckpointLock is acquired almost at the start of CreateCheckPoint() and
released at the end. The in-between code can take a while to be complete. All
that time interrupt will be on hold which doesn't seem to be a great idea but
that wasn't problematic until now.  I am having trouble due to this interrupt
hold for a long since I was trying to add code changes where the checkpointer
process is suppose to respond to the system read-write state change request as
soon as possible[1]. That cannot be done if the interrupt is disabled
since read-write state change uses the global barrier mechanism to convey system
state changes to other running processes.

So my question is, can we completely get rid of the CheckpointLock need in
CreateCheckPoint()?

Thoughts/Comments/Suggestions?

Thank you!

Regards,
Amul

1]http://postgr.es/m/CA+TgmoYexwDQjdd1=15KMz+7VfHVx8VHNL2qjRRK92P=csz...@mail.gmail.com




Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS

2021-01-06 Thread Craig Ringer
On Wed, 6 Jan 2021 at 18:23, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2021-01-06 00:30, Craig Ringer wrote:
> > Perhaps debug_invalidate_system_caches_always ? It's a bit long but we
> > use the debug prefix elsewhere too.
>
> Committed with that name.
>

Thanks very much.


> In your original patch, this hunk in pg_config_manual.h:
>
> + * You can define these by editing pg_config_manual.h but it's usually
> + * sufficient to pass CFLAGS to configure, e.g.
> + *
> + * ./configure --enable-debug --enable-cassert CFLAGS="-DUSE_VALGRIND"
> + *
> + * The flags are persisted in Makefile.global so they will be correctly
> + * applied to extensions, including those built by PGXS.
>
> I don't think that necessarily works for all settings.  Maybe we should
> make a pass through it and ensure it all works sensibly, but that seems
> like a separate project.
>

It should work for everything, since we persist the CFLAGS string, rather
than individual settings.

But I didn't mean to leave that in the same patch anyway, it's separate.


Re: Terminate the idle sessions

2021-01-06 Thread Thomas Munro
On Thu, Jan 7, 2021 at 4:51 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > One of the strange things about these errors is that they're
> > asynchronous/unsolicited, but they appear to the client to be the
> > response to their next request (if it doesn't eat ECONNRESET instead).
>
> Right, which is what makes class 57 (operator intervention) seem
> attractive to me.  From the client's standpoint these look little
> different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN,
> which are in that category.

Yeah, that's a good argument.




Re: plpgsql variable assignment with union is broken

2021-01-06 Thread Pavel Stehule
čt 7. 1. 2021 v 4:20 odesílatel Merlin Moncure  napsal:

> On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
> >
> > easter...@verfriemelt.org writes:
> > > i found, that the behaviour of variable assignment in combination with
> union is not working anymore:
> > >   DO $$
> > >   DECLARE t bool;
> > >   begin
> > >   t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true
> AS a;
> > >   END $$;
> >
> > > is this an intended change or is it a bug?
> >
> > It's an intended change, or at least I considered the case and thought
> > that it was useless because assignment will reject any result with more
> > than one row.  Do you have any non-toy example that wouldn't be as
> > clear or clearer without using UNION?  The above sure seems like an
> > example of awful SQL code.
>
> What is the definition of broken here?  What is the behavior of the
> query with the change and why?
>
> OP's query provably returns a single row and ought to always assign
> true as written.  A real world example might evaluate multiple
> condition branches so that the assignment resolves true if any branch
> is true. It could be rewritten with 'OR' of course.
>
> Is this also "broken"?
>   t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT
> 'something else' AS a WHERE NOT _Flag;
>
> What about this?
> SELECT INTO t true WHERE false
> UNION select true;
>

ANSI SQL allows only SELECT INTO or var := SQL expression and SQL
expression can be (subquery) too

do $$
declare t bool;
begin
  t := (SELECT true WHERE false  UNION SELECT true );
end;
$$;

Regards

Pavel


> merlin
>
>
>


Re: Terminate the idle sessions

2021-01-06 Thread Li Japin

--
Best regards
Japin Li

On Jan 7, 2021, at 10:03 AM, Thomas Munro 
mailto:thomas.mu...@gmail.com>> wrote:


* I'm not entirely comfortable with the name "idle_session_timeout",
because it sounds like it applies to all idle states, but actually
it only applies when we're not in a transaction.  I left the name
alone and tried to clarify the docs, but I wonder whether a different
name would be better.  (Alternatively, we could say that it does
apply in all cases, making the effective timeout when in a transaction
the smaller of the two GUCs.  But that'd be more complex to implement
and less flexible, so I'm not in favor of that answer.)

Hmm, it is a bit confusing, but having them separate is indeed more flexible.


Yes! It is a bit confusing. How about interactive_timeout? This is used by 
MySQL [1].

* The SQLSTATE you chose for the new error condition seems pretty
random.  I do not see it in the SQL standard, so using a code that's
within the spec-reserved code range is certainly wrong.  I went with
08P02 (the "P" makes it outside the reserved range), but I'm not
really happy either with using class 08 ("Connection Exception",
which seems to be mainly meant for connection-request failures),
or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is
practically identical but it's not even in the same major class.
Now 25 ("Invalid Transaction State") is certainly not right for
this new error, but I think what that's telling us is that 25 was a
misguided choice for the other error.  In a green field I think I'd
put both of them in class 53 ("Insufficient Resources") or maybe class
57 ("Operator Intervention").  Can we get away with renumbering the
older error at this point?  In any case I'd be inclined to move the
new error to 53 or 57 --- anybody have a preference which?

I don't have a strong view here, but 08 with a P doesn't seem crazy to
me.  Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout),
55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for
that, distinguished from deadlock by another error field), after these
timeouts you don't have a session/connection anymore.  The two are a
bit different though: in the older one, you were in a transaction, and
it seems to me quite newsworthy that your transaction has been
aborted, information that is not conveyed quite so clearly with a
connection-related error class.

Apologize! I just think it is a Connection Exception.

[1] 
https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_interactive_timeout


Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-06 Thread Fujii Masao




On 2021/01/07 12:42, Masahiko Sawada wrote:

On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao  wrote:




On 2021/01/07 10:01, Masahiko Sawada wrote:

On Wed, Jan 6, 2021 at 3:37 PM  wrote:



+#define Query_for_list_of_cursors \
+" SELECT name FROM pg_cursors"\

This query should be the following?

" SELECT pg_catalog.quote_ident(name) "\
"   FROM pg_catalog.pg_cursors "\
"  WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"

+/* CLOSE */
+  else if (Matches("CLOSE"))
+  COMPLETE_WITH_QUERY(Query_for_list_of_cursors
+  " UNION ALL SELECT 
'ALL'");

"UNION ALL" should be "UNION"?


Thank you for your advice, and I corrected them.


+   COMPLETE_WITH_QUERY(Query_for_list_of_cursors
+   " UNION SELECT 
'ABSOLUTE'"
+   " UNION SELECT 
'BACKWARD'"
+   " UNION SELECT 
'FORWARD'"
+   " UNION SELECT 
'RELATIVE'"
+   " UNION SELECT 'ALL'"
+   " UNION SELECT 'NEXT'"
+   " UNION SELECT 'PRIOR'"
+   " UNION SELECT 'FIRST'"
+   " UNION SELECT 'LAST'"
+   " UNION SELECT 'FROM'"
+   " UNION SELECT 'IN'");

This change makes psql unexpectedly output "FROM" and "IN" just after "FETCH".


I think "FROM" and "IN" can be placed just after "FETCH". According to
the documentation, the direction can be empty. For instance, we can do
like:


Thank you!


I've attached the patch improving the tab completion for DECLARE as an
example. What do you think?

BTW according to the documentation, the options of DECLARE statement
(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
 CURSOR [ { WITH | WITHOUT } HOLD ] FOR query

But I realized that these options are actually order-insensitive. For
instance, we can declare a cursor like:

=# declare abc scroll binary cursor for select * from pg_class;
DECLARE CURSOR

The both parser code and documentation has been unchanged from 2003.
Is it a documentation bug?


Thank you for your patch, and it is good.
I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are 
order-sensitive."
I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any order." 
, according to the documentation.


Thanks, you're right. I missed that sentence. I still don't think the
syntax of DECLARE statement in the documentation doesn't match its
implementation but I agree that it's order-insensitive.


I made a new patch, but the amount of codes was large due to order-insensitive.


Thank you for updating the patch!

Yeah, I'm also afraid a bit that conditions will exponentially
increase when a new option is added to DECLARE statement in the
future. Looking at the parser code for DECLARE statement, we can put
the same options multiple times (that's also why I don't think it
matches). For instance,

postgres(1:44758)=# begin;
BEGIN
postgres(1:44758)=# declare test binary binary binary cursor for
select * from pg_class;
DECLARE CURSOR

So how about simplify the above code as follows?

@@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end)
  else if (Matches("DECLARE", MatchAny))
  COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
"CURSOR");
+   /*
+* Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
+* NO SCROLL, and CURSOR. The tail doesn't match any keywords for
+* DECLARE, assume we want options.
+*/
+   else if (HeadMatches("DECLARE", MatchAny, "*") &&
+TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
+   COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
+ "CURSOR");


This change seems to cause "DECLARE ... CURSOR FOR SELECT " to
unexpectedly output BINARY, INSENSITIVE, etc.


Indeed. Is the following not complete but much better?


Yes, I think that's better.



@@ -3002,11 +3011,18 @@ psql_completion(const char *text, int start, int end)
 " UNION SELECT 'ALL'");

  /* DECLARE */
-   else if (Matches("DECLARE", MatchAny))
-   COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
- "CURSOR");
+   /*
+   * Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
+   * NO SCROLL, and CURSOR. If the tail is any one of options, assume we
+   * still want options.
+   */
+   else if (Matches("DECLARE", MatchAny) ||
+

Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Zhihong Yu
Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the
phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.

Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu  wrote:
> >
> > Hi,
> > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
> >
> > workers to Gather node to 0. With this change, there are chances
> > that the planner may choose the parallel plan.
> >
> > It would be nice if the scenarios where a parallel plan is not chosen
> are listed.
>
> There are many reasons, the planner may not choose a parallel plan for
> the select part, for instance if there are temporary tables, parallel
> unsafe functions, or the parallelism GUCs are not set properly,
> foreign tables and so on. see
> https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
> think so, we will add all the scenarios into the commit message.
>
> Having said that, we have extensive comments in the code(especially in
> the function SetParallelInsertState) about when parallel inserts are
> chosen.
>
> + * Parallel insertions are possible only if the upper node is Gather.
>   */
> +if (!IsA(gstate, GatherState))
>  return;
>
> + * Parallelize inserts only when the upper Gather node has no
> projections.
>   */
> +if (!gstate->ps.ps_ProjInfo)
> +{
> +/* Okay to parallelize inserts, so mark it. */
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +((DR_intorel *) dest)->is_parallel = true;
> +
> +/*
> + * For parallelizing inserts, we must send some information so
> that the
> + * workers can build their own dest receivers. For CTAS, this
> info is
> + * into clause, object id (to open the created table).
> + *
> + * Since the required information is available in the dest
> receiver,
> + * store a reference to it in the Gather state so that it will be
> used
> + * in ExecInitParallelPlan to pick the information.
> + */
> +gstate->dest = dest;
> +}
> +else
> +{
> +/*
> + * Upper Gather node has projections, so parallel insertions are
> not
> + * allowed.
> + */
>
> > +   if ((root->parse->parallelInsCmdTupleCostOpt &
> > +PARALLEL_INSERT_SELECT_QUERY) &&
> > +   (root->parse->parallelInsCmdTupleCostOpt &
> > +PARALLEL_INSERT_CAN_IGN_TUP_COST))
> > +   {
> > +   /* We are ignoring the parallel tuple cost, so mark it. */
> > +   root->parse->parallelInsCmdTupleCostOpt |=
> > +
>  PARALLEL_INSERT_TUP_COST_IGNORED;
> >
> > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED
> is implied.
> >
> > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the
> setting (1) of the first two bits should suffice.
>
> The way these flags work is as follows: before planning in CTAS, we
> set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
> gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
> actually ignored the tuple cost in cost_gather we set
> PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
> PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
> generate_useful_gather_paths, and the function
> generate_useful_gather_paths can return before reaching cost_gather,
> see below snippets. So, we need the 3 flags.
>
> void
> generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> ListCell   *lc;
> doublerows;
> double   *rowsp = NULL;
> List   *useful_pathkeys_list = NIL;
> Path   *cheapest_partial_path = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
> /* Should we override the rel's rowcount estimate? */
> if (override_rows)
> rowsp = 
>
> /* generate the regular gather (merge) paths */
> generate_gather_paths(root, rel, override_rows);
>
> void
> generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> Path   *cheapest_partial_path;
> Path   *simple_gather_path;
> ListCell   *lc;
> doublerows;
> double   *rowsp = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Single transaction in the tablesync worker?

2021-01-06 Thread Amit Kapila
On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila  wrote:
>
> On Wed, Jan 6, 2021 at 2:13 PM Peter Smith  wrote:
> >
> > I think it makes sense. If there can be a race between the tablesync
> > re-launching (after error), and the AlterSubscription_refresh removing
> > some table’s relid from the subscription then there could be lurking
> > slot/origin tablesync resources (of the removed table) which a
> > subsequent DROP SUBSCRIPTION cannot discover. I will think more about
> > how/if it is possible to make this happen. Anyway, I suppose I ought
> > to refactor/isolate some of the tablesync cleanup code in case it
> > needs to be commonly called from DropSubscription and/or from
> > AlterSubscription_refresh.
> >
>
> Fair enough.
>

I think before implementing, we should once try to reproduce this
case. I understand this is a timing issue and can be reproduced only
with the help of debugger but we should do that.

> BTW, I have analyzed whether we need any modifications to
> pg_dump/restore for this patch as this changes the state of one of the
> fields in the system table and concluded that we don't need any
> change. For subscriptions, we don't dump any of the information from
> pg_subscription_rel, rather we just dump subscriptions with the
> connect option as false which means users need to enable the
> subscription and refresh publication after restore. I have checked
> this in the code and tested it as well. The related information is
> present in pg_dump doc page [1], see from "When dumping logical
> replication subscriptions ".
>

I have further analyzed that we don't need to do anything w.r.t
pg_upgrade as well because it uses pg_dump/pg_dumpall to dump the
schema info of the old cluster and then restore it to the new cluster.
And, we know that pg_dump ignores the info in pg_subscription_rel, so
we don't need to change anything as our changes are specific to the
state of one of the columns in pg_subscription_rel. I have not tested
this but we should test it by having some relations in not_ready state
and then allow the old cluster (<=PG13) to be upgraded to new (pg14)
both with and without this patch and see if there is any change in
behavior.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-06 Thread Fujii Masao




On 2021/01/05 16:56, Bharath Rupireddy wrote:

On Sat, Jan 2, 2021 at 11:19 AM Bharath Rupireddy
 wrote:

I'm sorry for the mess. I missed adding the new files into the v6-0001
patch. Please ignore the v6 patch set and consder the v7 patch set for
further review. Note that 0002 and 0003 patches have no difference
from v5 patch set.


It seems like cf bot was failing on v7 patches. On Linux, it fails
while building documentation in 0001 patch, I corrected that.  On
FreeBSD, it fails in one of the test cases I added, since it was
unstable, I corrected it now.

Attaching v8 patch set. Hopefully, cf bot will be happy with v8.

Please consider the v8 patch set for further review.


Thanks for the patch!

-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.1.sql postgres_fdw--1.0--1.1.sql

Shouldn't we leave 1.0.sql as it is and create 1.0--1.1.sql so that
we can run the followings?

CREATE EXTENSION postgres_fdw VERSION "1.0";
ALTER EXTENSION postgres_fdw UPDATE TO "1.1";


+
+  Functions

The document format for functions should be consistent with
that in other contrib module like pgstattuple?


+   When called in the local session, it returns an array with each element as a
+   pair of the foreign server names of all the open connections that are
+   previously made to the foreign servers and true or
+   false to show whether or not the connection is valid.

We thought that the information about whether the connection is valid or
not was useful to, for example, identify and close explicitly the long-living
invalid connections because they were useless. But thanks to the recent
bug fix for connection leak issue, that information would be no longer
so helpful for us? False is returned only when the connection is used in
this local transaction but it's marked as invalidated. In this case that
connection cannot be explicitly closed because it's used in this transaction.
It will be closed at the end of transaction. Thought?


I guess that you made postgres_fdw_get_connections() return the array
because the similar function dblink_get_connections() does that. But
isn't it more convenient to make that return the set of records?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-06 Thread Michael Paquier
On Wed, Jan 06, 2021 at 03:58:22PM +0200, Heikki Linnakangas wrote:
> contrib/pgcrypto/internal-sha2.c and
> src/backend/utils/adt/cryptohashfuncs.c: the call to pg_cryptohash_create()
> is missing check for NULL result. With your latest patch, that's OK because
> the subsequent pg_cryptohash_update() calls will fail if passed a NULL
> context. But seems sloppy.

Is it?  pg_cryptohash_create() will never return NULL for the backend.

> contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions are
> missing checks for error return codes.

Indeed, these are incorrect, thanks.  I'll go fix that separately.

> contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows the
> built-in implementation of SHA1 on some platforms. Should we add support for
> SHA1 in pg_cryptohash and use that for consistency?

Yeah, I have sent a separate patch for that:
https://commitfest.postgresql.org/31/2868/
The cleanups produced by this patch are kind of nice.

> src/backend/libpq/auth-scram.c: SHA256 is used in the mock authentication.
> If the pg_cryptohash functions fail, we throw a distinct error "could not
> encode salt" that reveals that it was a mock authentication. I don't think
> this is a big deal in practice, it would be hard for an attacker to induce
> the SHA256 computation to fail, and there are probably better ways to
> distinguish mock authentication from real, like timing attacks. But still.

This maps with the second error in the mock routine, so wouldn't it be
better to change both and back-patch?  The last place where this error
message is used is pg_be_scram_build_secret() for the generation of
what's stored in pg_authid.  An idea would be to use "out of memory".
That can be faced for any palloc() calls.

> src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we
> still need separate fields for the different sha variants.

Using separate fields looked cleaner to me if it came to debugging,
and the cleanup was rather minimal except if we use more switch/case
to set up the various variables.  What about something like the
attached?
--
Michael
diff --git a/src/include/common/checksum_helper.h b/src/include/common/checksum_helper.h
index ebdf1ccf44..cac7570ea1 100644
--- a/src/include/common/checksum_helper.h
+++ b/src/include/common/checksum_helper.h
@@ -42,10 +42,7 @@ typedef enum pg_checksum_type
 typedef union pg_checksum_raw_context
 {
 	pg_crc32c	c_crc32c;
-	pg_cryptohash_ctx *c_sha224;
-	pg_cryptohash_ctx *c_sha256;
-	pg_cryptohash_ctx *c_sha384;
-	pg_cryptohash_ctx *c_sha512;
+	pg_cryptohash_ctx *c_sha2;
 } pg_checksum_raw_context;
 
 /*
diff --git a/src/common/checksum_helper.c b/src/common/checksum_helper.c
index f6b49de405..2881b2c178 100644
--- a/src/common/checksum_helper.c
+++ b/src/common/checksum_helper.c
@@ -93,42 +93,42 @@ pg_checksum_init(pg_checksum_context *context, pg_checksum_type type)
 			INIT_CRC32C(context->raw_context.c_crc32c);
 			break;
 		case CHECKSUM_TYPE_SHA224:
-			context->raw_context.c_sha224 = pg_cryptohash_create(PG_SHA224);
-			if (context->raw_context.c_sha224 == NULL)
+			context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA224);
+			if (context->raw_context.c_sha2 == NULL)
 return -1;
-			if (pg_cryptohash_init(context->raw_context.c_sha224) < 0)
+			if (pg_cryptohash_init(context->raw_context.c_sha2) < 0)
 			{
-pg_cryptohash_free(context->raw_context.c_sha224);
+pg_cryptohash_free(context->raw_context.c_sha2);
 return -1;
 			}
 			break;
 		case CHECKSUM_TYPE_SHA256:
-			context->raw_context.c_sha256 = pg_cryptohash_create(PG_SHA256);
-			if (context->raw_context.c_sha256 == NULL)
+			context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA256);
+			if (context->raw_context.c_sha2 == NULL)
 return -1;
-			if (pg_cryptohash_init(context->raw_context.c_sha256) < 0)
+			if (pg_cryptohash_init(context->raw_context.c_sha2) < 0)
 			{
-pg_cryptohash_free(context->raw_context.c_sha256);
+pg_cryptohash_free(context->raw_context.c_sha2);
 return -1;
 			}
 			break;
 		case CHECKSUM_TYPE_SHA384:
-			context->raw_context.c_sha384 = pg_cryptohash_create(PG_SHA384);
-			if (context->raw_context.c_sha384 == NULL)
+			context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA384);
+			if (context->raw_context.c_sha2 == NULL)
 return -1;
-			if (pg_cryptohash_init(context->raw_context.c_sha384) < 0)
+			if (pg_cryptohash_init(context->raw_context.c_sha2) < 0)
 			{
-pg_cryptohash_free(context->raw_context.c_sha384);
+pg_cryptohash_free(context->raw_context.c_sha2);
 return -1;
 			}
 			break;
 		case CHECKSUM_TYPE_SHA512:
-			context->raw_context.c_sha512 = pg_cryptohash_create(PG_SHA512);
-			if (context->raw_context.c_sha512 == NULL)
+			context->raw_context.c_sha2 = pg_cryptohash_create(PG_SHA512);
+			if (context->raw_context.c_sha2 == NULL)
 return -1;
-			if (pg_cryptohash_init(context->raw_context.c_sha512) < 0)
+			if (pg_cryptohash_init(context->raw_context.c_sha2) < 0)
 		

Re: plpgsql variable assignment with union is broken

2021-01-06 Thread Merlin Moncure
On Wed, Jan 6, 2021 at 9:39 PM Tom Lane  wrote:
>
> Merlin Moncure  writes:
> > On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
> >> easter...@verfriemelt.org writes:
> >>> i found, that the behaviour of variable assignment in combination with 
> >>> union is not working anymore:
> >>> DO $$
> >>> DECLARE t bool;
> >>> begin
> >>> t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;
> >>> END $$;
> >>> is this an intended change or is it a bug?
>
> >> It's an intended change, or at least I considered the case and thought
> >> that it was useless because assignment will reject any result with more
> >> than one row.  Do you have any non-toy example that wouldn't be as
> >> clear or clearer without using UNION?  The above sure seems like an
> >> example of awful SQL code.
>
> > What is the definition of broken here?  What is the behavior of the
> > query with the change and why?
>
> The OP is complaining that that gets a syntax error since c9d529848.
>
> > OP's query provably returns a single row and ought to always assign
> > true as written.
>
> My opinion is that (a) it's useless and (b) there has never been any
> documentation that claimed that you could do this.

Here is what the documentation says:

> variable { := | = } expression;
> As explained previously, the expression in such a statement is evaluated by 
> means of an SQL SELECT command sent to the main database engine.

This is valid SQL:
SELECT a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;

So I'd argue that OP's query *is* syntactically valid per the rules as
I understand them.

and is my opinion entirely consistent with the documentation in that it
a) resolves exactly one row, and:
b) is made syntactically valid by prefixing the expression with SELECT.

Aesthetical considerations are irrelevant IMO.

merlin




Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Bharath Rupireddy
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu  wrote:
>
> Hi,
> For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
>
> workers to Gather node to 0. With this change, there are chances
> that the planner may choose the parallel plan.
>
> It would be nice if the scenarios where a parallel plan is not chosen are 
> listed.

There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.

Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.

+ * Parallel insertions are possible only if the upper node is Gather.
  */
+if (!IsA(gstate, GatherState))
 return;

+ * Parallelize inserts only when the upper Gather node has no projections.
  */
+if (!gstate->ps.ps_ProjInfo)
+{
+/* Okay to parallelize inserts, so mark it. */
+if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+((DR_intorel *) dest)->is_parallel = true;
+
+/*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+gstate->dest = dest;
+}
+else
+{
+/*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */

> +   if ((root->parse->parallelInsCmdTupleCostOpt &
> +PARALLEL_INSERT_SELECT_QUERY) &&
> +   (root->parse->parallelInsCmdTupleCostOpt &
> +PARALLEL_INSERT_CAN_IGN_TUP_COST))
> +   {
> +   /* We are ignoring the parallel tuple cost, so mark it. */
> +   root->parse->parallelInsCmdTupleCostOpt |=
> +   PARALLEL_INSERT_TUP_COST_IGNORED;
>
> If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and 
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is 
> implied.
>
> Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting 
> (1) of the first two bits should suffice.

The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.

void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
ListCell   *lc;
doublerows;
double   *rowsp = NULL;
List   *useful_pathkeys_list = NIL;
Path   *cheapest_partial_path = NULL;

/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;

/* Should we override the rel's rowcount estimate? */
if (override_rows)
rowsp = 

/* generate the regular gather (merge) paths */
generate_gather_paths(root, rel, override_rows);

void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
Path   *cheapest_partial_path;
Path   *simple_gather_path;
ListCell   *lc;
doublerows;
double   *rowsp = NULL;

/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: [Patch] Optimize dropping of relation buffers using dlist

2021-01-06 Thread Tang, Haiying
>I'd like take a look at them and redo some of the tests using my machine. I'll 
>send my test reults in a separate email after this.

I did the same tests with Kirk's scripts using the latest patch on my own 
machine. The results look pretty good and similar with Kirk's. 

average of 5 runs.

[VACUUM failover test for 1000 relations] Unit is second, 
%reg=(patched-master)/ master

| s_b   | Master| Patched   | %reg  | 
|--|---|--|--| 
| 128 MB| 0.408 | 0.308 |  -24.44%  | 
| 1 GB  | 0.809 | 0.308 |  -61.94%  | 
| 20 GB | 12.529| 0.308 |  -97.54%  | 
| 100 GB| 59.310| 0.369 |  -99.38%  |

[TRUNCATE failover test for 1000 relations] Unit is second, 
%reg=(patched-master)/ master

| s_b   | Master| Patched   | %reg  | 
|--|---|--|--| 
| 128 MB| 0.287 | 0.207 |  -27.91%  | 
| 1 GB  | 0.688 | 0.208 |  -69.84%  | 
| 20 GB | 12.449| 0.208 |  -98.33%  | 
| 100 GB| 61.800| 0.207 |  -99.66%  |

Besides, I did the test for threshold value again. (I rechecked my test process 
and found out that I forgot to check the data synchronization state on standby 
which may introduce some NOISE to my results.)
The following results show we can't get optimize over NBuffers/32 just like 
Kirk's test results, so I do approve with Kirk on the threshold value.

%regression:
| rel_size |128MB|1GB|20GB| 100GB |
|--||||---| 
| NB/512   | 0% | 0% | 0% | -48%  | 
| NB/256   | 0% | 0% | 0% | -33%  | 
| NB/128   | 0% | 0% | 0% | -9%   | 
| NB/64| 0% | 0% | 0% | -5%   | 
| NB/32| 0% | 0% |-4% | -3%   | 
| NB/16| 0% | 0% |-4% | 1%| 
| NB/8 | 1% | 0% | 1% | 3%|

Optimization details(unit: second):
patched:
shared_buffers  NBuffers/512NBuffers/256NBuffers/128NBuffers/64 
NBuffers/32 NBuffers/16 NBuffers/8
-
128M0.107   0.107   0.107   0.106   
0.107   0.107   0.107 
1G  0.107   0.107   0.107   0.107   
0.107   0.107   0.107 
20G 0.107   0.108   0.207   0.307   
0.442   0.876   1.577 
100G0.208   0.308   0.559   1.060   
1.961   4.567   7.922 

master:
shared_buffers  NBuffers/512NBuffers/256NBuffers/128NBuffers/64 
NBuffers/32 NBuffers/16 NBuffers/8
-
128M0.107   0.107   0.107   0.107   
0.107   0.107   0.106 
1G  0.107   0.107   0.107   0.107   
0.107   0.107   0.107 
20G 0.107   0.107   0.208   0.308   
0.457   0.910   1.560 
100G0.308   0.409   0.608   1.110   
2.011   4.516   7.721 

[Specs]
CPU : 40 processors  (Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz)
Memory: 128G
OS: CentOS 8

Any question to my test is welcome.

Regards,
Tang





Re: Single transaction in the tablesync worker?

2021-01-06 Thread Peter Smith
Thankyou for the feedback.

On Thu, Jan 7, 2021 at 12:45 PM Hou, Zhijie  wrote:
>
> > PSA the v11 patch for the Tablesync Solution1.
> >
> > Difference from v10:
> > - Addresses several recent review comments.
> > - pg_indent has been run
> >
> Hi
>
> I took a look into the patch and have some comments.
>
> 1.
>   *   So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
> - *   CATCHUP -> SYNCDONE -> READY.
> + *   CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY.
>
> I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE,
> But It seems the SUBREL_STATE_TCOPYDONE is actually set before 
> SUBREL_STATE_SYNCWAIT[1].
> Did i miss something here ?
>
> [1]-
> +   UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
> +  
> MyLogicalRepWorker->relid,
> +  
> SUBREL_STATE_TCOPYDONE,
> +  
> MyLogicalRepWorker->relstate_lsn);
> ...
> /*
>  * We are done with the initial data synchronization, update the 
> state.
>  */
> SpinLockAcquire(>relmutex);
> MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
> --
>

Thanks for reporting this mistake. I will correct the comment for the
next patch (v12)

>
> 2.
> i = initialize,
> d = data is being copied,
> +   C = table data has been copied,
> s = synchronized,
> r = ready (normal replication)
>
> +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
> +* 
> (sublsn NULL) */
> The character representing 'data has been copied' in the catalog seems 
> different from the macro define.
>

Yes, same was already previously reported [1]
[1] 
https://www.postgresql.org/message-id/CAA4eK1Kyi037XZzyrLE71MS2KoMmNSqa6RrQLdSCeeL27gnL%2BA%40mail.gmail.com
It will be fixed in the next patch (v12)


Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Terminate the idle sessions

2021-01-06 Thread Tom Lane
Thomas Munro  writes:
> One of the strange things about these errors is that they're
> asynchronous/unsolicited, but they appear to the client to be the
> response to their next request (if it doesn't eat ECONNRESET instead).

Right, which is what makes class 57 (operator intervention) seem
attractive to me.  From the client's standpoint these look little
different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN,
which are in that category.

regards, tom lane




Re: vacuum_cost_page_miss default value and modern hardware

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 5:39 PM Masahiko Sawada  wrote:
>
> On Mon, Dec 28, 2020 at 5:17 AM Peter Geoghegan  wrote:
> >
> > Simply decreasing vacuum_cost_page_dirty seems like a low risk way of
> > making the VACUUM costing more useful within autovacuum workers.
> > Halving vacuum_cost_page_dirty to 5 would be a good start, though I
> > think that a value as low as 2 would be better. That would make it
> > only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing
> > a page that is in shared_buffers but did not need to be dirtied),
> > which seems sensible to me when considered in the context in which the
> > value is actually applied (and not some abstract theoretical context).
>
> Perhaps you meant to decrease vacuumm_cost_page_miss instead of
> vacuum_cost_page_dirty?
>
> >
> > There are a few reasons why this seems like a good idea now:
> >
> > * Throttling/delaying VACUUM is only useful as a way of smoothing the
> > impact on production queries, which is an important goal, but
> > currently we don't discriminate against the cost that we really should
> > keep under control (dirtying new pages within VACUUM) very well.
> >
> > This is due to the aforementioned trends, the use of a strategy ring
> > buffer by VACUUM, the fact that indexes are usually vacuumed in
> > sequential physical order these days, and many other things that were
> > not a factor in 2004.
> >
> > * There is a real downside to throttling VACUUM unnecessarily, and the
> > effects are *non-linear*. On a large table, the oldest xmin cutoff may
> > become very old by the time we're only (say) half way through the
> > initial table scan in lazy_scan_heap(). There may be relatively little
> > work to do because most of the dead tuples won't be before the oldest
> > xmin cutoff by that time (VACUUM just cannot keep up). Excessive
> > throttling for simple page misses may actually *increase* the amount
> > of I/O that VACUUM has to do over time -- we should try to get to the
> > pages that actually need to be vacuumed quickly, which are probably
> > already dirty anyway (and thus are probably going to add little to the
> > cost delay limit in practice). Everything is connected to everything
> > else.
> >
> > * vacuum_cost_page_miss is very much not like random_page_cost, and
> > the similar names confuse the issue -- this is not an optimization
> > problem. Thinking about VACUUM as unrelated to the workload itself is
> > obviously wrong. Changing the default is also an opportunity to clear
> > that up.
> >
> > Even if I am wrong to suggest that a miss within VACUUM should only be
> > thought of as 2x as expensive as a hit in some *general* sense, I am
> > concerned about *specific* consequences. There is no question about
> > picking the best access path here -- we're still going to have to
> > access the same blocks in the same way sooner or later. In general I
> > think that we should move in the direction of more frequent, cheaper
> > VACUUM operations [1], though we've already done a lot of work in that
> > direction (e.g. freeze map work).
>
> I agree with that direction.
>
> >
> > * Some impact from VACUUM on query performance may in fact be a good thing.
> >
> > Deferring the cost of vacuuming can only make sense if we'll
> > eventually be able to catch up because we're experiencing a surge in
> > demand, which seems kind of optimistic -- it seems more likely that
> > the GC debt will just grow and grow. Why should the DBA not expect to
> > experience some kind of impact, which could be viewed as a natural
> > kind of backpressure? The most important thing is consistent
> > performance.
> >
> > * Other recent work such as the vacuum_cleanup_index_scale_factor
> > patch has increased the relative cost of index vacuuming in some
> > important cases: we don't have a visibility/freeze map for indexes,
> > but index vacuuming that doesn't dirty any pages and has a TID kill
> > list that's concentrated at the end of the heap/table is pretty cheap
> > (the TID binary search is cache efficient/cheap). This change will
> > help these workloads by better reflecting the way in which index
> > vacuuming can be cheap for append-only tables with a small amount of
> > garbage for recently inserted tuples that also got updated/deleted.
> >
> > * Lowering vacuum_cost_page_miss's default (as opposed to changing
> > something else) is a simple and less disruptive way of achieving these
> > goals.
> >
> > This approach seems unlikely to break existing VACUUM-related custom
> > settings from current versions that get reused on upgrade. I expect
> > little impact on small installations.
> >
>
> I recalled the discussion decreasing the default value for
> autovacuum_cost_delay from 20ms to 2ms on PostgreSQL 12. I re-read
> through the discussion but there wasn't the discussion changing
> hit/miss/dirty.
>
> Whereas the change we did for autovacuum_cost_delay affects every
> installation, lowering vacuum_cost_page_miss would bring a different
> 

Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-06 Thread Masahiko Sawada
On Thu, Jan 7, 2021 at 10:59 AM Fujii Masao  wrote:
>
>
>
> On 2021/01/07 10:01, Masahiko Sawada wrote:
> > On Wed, Jan 6, 2021 at 3:37 PM  wrote:
> >>
> >>> +#define Query_for_list_of_cursors \
> >>> +" SELECT name FROM pg_cursors"\
> >>>
> >>> This query should be the following?
> >>>
> >>> " SELECT pg_catalog.quote_ident(name) "\
> >>> "   FROM pg_catalog.pg_cursors "\
> >>> "  WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"
> >>>
> >>> +/* CLOSE */
> >>> +  else if (Matches("CLOSE"))
> >>> +  COMPLETE_WITH_QUERY(Query_for_list_of_cursors
> >>> +  " UNION ALL SELECT 
> >>> 'ALL'");
> >>>
> >>> "UNION ALL" should be "UNION"?
> >>
> >> Thank you for your advice, and I corrected them.
> >>
>  +   COMPLETE_WITH_QUERY(Query_for_list_of_cursors
>  +   " UNION SELECT 
>  'ABSOLUTE'"
>  +   " UNION SELECT 
>  'BACKWARD'"
>  +   " UNION SELECT 
>  'FORWARD'"
>  +   " UNION SELECT 
>  'RELATIVE'"
>  +   " UNION SELECT 
>  'ALL'"
>  +   " UNION SELECT 
>  'NEXT'"
>  +   " UNION SELECT 
>  'PRIOR'"
>  +   " UNION SELECT 
>  'FIRST'"
>  +   " UNION SELECT 
>  'LAST'"
>  +   " UNION SELECT 
>  'FROM'"
>  +   " UNION SELECT 
>  'IN'");
> 
>  This change makes psql unexpectedly output "FROM" and "IN" just after 
>  "FETCH".
> >>>
> >>> I think "FROM" and "IN" can be placed just after "FETCH". According to
> >>> the documentation, the direction can be empty. For instance, we can do
> >>> like:
> >>
> >> Thank you!
> >>
> >>> I've attached the patch improving the tab completion for DECLARE as an
> >>> example. What do you think?
> >>>
> >>> BTW according to the documentation, the options of DECLARE statement
> >>> (BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
> >>>
> >>> DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
> >>> CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
> >>>
> >>> But I realized that these options are actually order-insensitive. For
> >>> instance, we can declare a cursor like:
> >>>
> >>> =# declare abc scroll binary cursor for select * from pg_class;
> >>> DECLARE CURSOR
> >>>
> >>> The both parser code and documentation has been unchanged from 2003.
> >>> Is it a documentation bug?
> >>
> >> Thank you for your patch, and it is good.
> >> I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO 
> >> SCROLL) are order-sensitive."
> >> I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any 
> >> order." , according to the documentation.
> >
> > Thanks, you're right. I missed that sentence. I still don't think the
> > syntax of DECLARE statement in the documentation doesn't match its
> > implementation but I agree that it's order-insensitive.
> >
> >> I made a new patch, but the amount of codes was large due to 
> >> order-insensitive.
> >
> > Thank you for updating the patch!
> >
> > Yeah, I'm also afraid a bit that conditions will exponentially
> > increase when a new option is added to DECLARE statement in the
> > future. Looking at the parser code for DECLARE statement, we can put
> > the same options multiple times (that's also why I don't think it
> > matches). For instance,
> >
> > postgres(1:44758)=# begin;
> > BEGIN
> > postgres(1:44758)=# declare test binary binary binary cursor for
> > select * from pg_class;
> > DECLARE CURSOR
> >
> > So how about simplify the above code as follows?
> >
> > @@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end)
> >  else if (Matches("DECLARE", MatchAny))
> >  COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
> >"CURSOR");
> > +   /*
> > +* Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
> > +* NO SCROLL, and CURSOR. The tail doesn't match any keywords for
> > +* DECLARE, assume we want options.
> > +*/
> > +   else if (HeadMatches("DECLARE", MatchAny, "*") &&
> > +TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
> > +   COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
> > + "CURSOR");
>
> This change seems to cause "DECLARE ... CURSOR FOR SELECT " to
> unexpectedly output BINARY, INSENSITIVE, etc.

Indeed. Is the following not complete but much better?

@@ -3002,11 +3011,18 @@ 

Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-06 Thread Michael Paquier
On Wed, Jan 06, 2021 at 03:27:03PM +0200, Heikki Linnakangas wrote:
> Looks fine to me.

Thanks, I have been able to get this part done as of 55fe26a.
--
Michael


signature.asc
Description: PGP signature


Re: Some more hackery around cryptohashes (some fixes + SHA1)

2021-01-06 Thread Michael Paquier
On Mon, Dec 14, 2020 at 12:48:15PM +0900, Michael Paquier wrote:
> This is a nice cleanup, so I have moved ahead and applied it.  A
> rebased version of the SHA1 business is attached.

Rebased version attached to address the conflicts caused by 55fe26a.
I have fixed three places in pgcrypto where this missed to issue an
error if one of the init/update/final cryptohash calls failed for
SHA1.
--
Michael
From fc0819047f18f664efa075a578af66c83854b82c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 7 Jan 2021 12:31:10 +0900
Subject: [PATCH v3] Introduce SHA1 in cryptohash infrastructure

---
 src/include/common/cryptohash.h   |   1 +
 src/include/common/sha1.h |  19 +
 src/common/Makefile   |   1 +
 src/common/cryptohash.c   |  11 +
 src/common/cryptohash_openssl.c   |   3 +
 src/common/sha1.c | 369 ++
 .../pgcrypto/sha1.h => src/common/sha1_int.h  |  43 +-
 contrib/pgcrypto/Makefile |   2 +-
 contrib/pgcrypto/internal.c   |  34 +-
 contrib/pgcrypto/sha1.c   | 331 
 contrib/uuid-ossp/.gitignore  |   1 -
 contrib/uuid-ossp/Makefile|   6 -
 contrib/uuid-ossp/uuid-ossp.c |  27 +-
 configure |  19 +-
 configure.ac  |  24 +-
 src/Makefile.global.in|   1 -
 src/tools/msvc/Mkvcbuild.pm   |   9 +-
 17 files changed, 478 insertions(+), 423 deletions(-)
 create mode 100644 src/include/common/sha1.h
 create mode 100644 src/common/sha1.c
 rename contrib/pgcrypto/sha1.h => src/common/sha1_int.h (72%)
 delete mode 100644 contrib/pgcrypto/sha1.c

diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
index 3ecaf62113..32d7784ca5 100644
--- a/src/include/common/cryptohash.h
+++ b/src/include/common/cryptohash.h
@@ -19,6 +19,7 @@
 typedef enum
 {
 	PG_MD5 = 0,
+	PG_SHA1,
 	PG_SHA224,
 	PG_SHA256,
 	PG_SHA384,
diff --git a/src/include/common/sha1.h b/src/include/common/sha1.h
new file mode 100644
index 00..a61bc47ded
--- /dev/null
+++ b/src/include/common/sha1.h
@@ -0,0 +1,19 @@
+/*-
+ *
+ * sha1.h
+ *	  Constants related to SHA1.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/common/sha1.h
+ *
+ *-
+ */
+#ifndef PG_SHA1_H
+#define PG_SHA1_H
+
+/* Size of result generated by SHA1 computation */
+#define SHA1_DIGEST_LENGTH 20
+
+#endif			/* PG_SHA1_H */
diff --git a/src/common/Makefile b/src/common/Makefile
index f624977939..04a4da0576 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -88,6 +88,7 @@ else
 OBJS_COMMON += \
 	cryptohash.o \
 	md5.o \
+	sha1.o \
 	sha2.o
 endif
 
diff --git a/src/common/cryptohash.c b/src/common/cryptohash.c
index efedadd626..eecf17081b 100644
--- a/src/common/cryptohash.c
+++ b/src/common/cryptohash.c
@@ -25,6 +25,7 @@
 
 #include "common/cryptohash.h"
 #include "md5_int.h"
+#include "sha1_int.h"
 #include "sha2_int.h"
 
 /*
@@ -47,6 +48,7 @@ struct pg_cryptohash_ctx
 	union
 	{
 		pg_md5_ctx	md5;
+		pg_sha1_ctx	sha1;
 		pg_sha224_ctx sha224;
 		pg_sha256_ctx sha256;
 		pg_sha384_ctx sha384;
@@ -97,6 +99,9 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx)
 		case PG_MD5:
 			pg_md5_init(>data.md5);
 			break;
+		case PG_SHA1:
+			pg_sha1_init(>data.sha1);
+			break;
 		case PG_SHA224:
 			pg_sha224_init(>data.sha224);
 			break;
@@ -132,6 +137,9 @@ pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len)
 		case PG_MD5:
 			pg_md5_update(>data.md5, data, len);
 			break;
+		case PG_SHA1:
+			pg_sha1_update(>data.sha1, data, len);
+			break;
 		case PG_SHA224:
 			pg_sha224_update(>data.sha224, data, len);
 			break;
@@ -167,6 +175,9 @@ pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest)
 		case PG_MD5:
 			pg_md5_final(>data.md5, dest);
 			break;
+		case PG_SHA1:
+			pg_sha1_final(>data.sha1, dest);
+			break;
 		case PG_SHA224:
 			pg_sha224_final(>data.sha224, dest);
 			break;
diff --git a/src/common/cryptohash_openssl.c b/src/common/cryptohash_openssl.c
index 551ec392b6..006e867403 100644
--- a/src/common/cryptohash_openssl.c
+++ b/src/common/cryptohash_openssl.c
@@ -131,6 +131,9 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx)
 		case PG_MD5:
 			status = EVP_DigestInit_ex(ctx->evpctx, EVP_md5(), NULL);
 			break;
+		case PG_SHA1:
+			status = EVP_DigestInit_ex(ctx->evpctx, EVP_sha1(), NULL);
+			break;
 		case PG_SHA224:
 			status = EVP_DigestInit_ex(ctx->evpctx, EVP_sha224(), NULL);
 			break;
diff --git a/src/common/sha1.c b/src/common/sha1.c
new file mode 100644
index 00..f8ed4d6808

Re: plpgsql variable assignment with union is broken

2021-01-06 Thread Tom Lane
Merlin Moncure  writes:
> On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
>> easter...@verfriemelt.org writes:
>>> i found, that the behaviour of variable assignment in combination with 
>>> union is not working anymore:
>>> DO $$
>>> DECLARE t bool;
>>> begin
>>> t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;
>>> END $$;
>>> is this an intended change or is it a bug?

>> It's an intended change, or at least I considered the case and thought
>> that it was useless because assignment will reject any result with more
>> than one row.  Do you have any non-toy example that wouldn't be as
>> clear or clearer without using UNION?  The above sure seems like an
>> example of awful SQL code.

> What is the definition of broken here?  What is the behavior of the
> query with the change and why?

The OP is complaining that that gets a syntax error since c9d529848.

> OP's query provably returns a single row and ought to always assign
> true as written.

My opinion is that (a) it's useless and (b) there has never been any
documentation that claimed that you could do this.

As for (a), given the execution restriction that you can't return more
than one row, I can't think of anything you could do with this that
couldn't be done, both more clearly and far more cheaply, with a CASE
or similar construct.

As for (b), the docs have always given the syntax of plpgsql assignment as
"variable := expression"; whatever you might think an "expression" is,
I dispute that a syntactically-invalid fragment of a UNION query
qualifies.  The fact that it was accepted is a completely accidental
artifact of the lax way that plpgsql assignment has been parsed up to now.

(Having said that, I would've preserved it if I could, but it's exactly
the fact that it *isn't* a syntactically valid UNION construct that
makes it hard to do so.  If it's possible at all, it would take some
really ugly refactoring of the grammar.)

One last point is that if you're really intent on writing things this
way, you can still do it with SELECT INTO instead of assignment syntax.

In case you're wondering, INTERSECT and EXCEPT are in the same boat.

regards, tom lane




Re: Phrase search vs. multi-lexeme tokens

2021-01-06 Thread Alexander Korotkov
Hi!

On Wed, Jan 6, 2021 at 8:18 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class 
> > foo"');
> >  ?column?
> > --
> >  f
>
> Yeah, surely this is wrong.

Thank you for confirming my thoughts.  I also felt that is wrong but
doubted such a basic bug could exist for so long.

> > # select to_tsquery('pg_class <-> foo');
> >   to_tsquery
> > --
> >  ( 'pg' & 'class' ) <-> 'foo'
>
> > I think if a user writes 'pg_class <-> foo', then it's expected to
> > match 'pg_class foo' independently on which lexemes 'pg_class' is
> > split into.
>
> Indeed.  It seems to me that this:
>
> regression=# select to_tsquery('pg_class');
>to_tsquery
> 
>  'pg' & 'class'
> (1 row)
>
> is wrong all by itself.  Now that we have phrase search, a much
> saner translation would be "'pg' <-> 'class'".  If we fixed that
> then it seems like the more complex case would just work.

Nice idea!  Fixing this way should be much easier than fixing only the
case when we have the phrase operator on the upper level.

> I read your patch over quickly and it seems like a reasonable
> approach (but sadly underdocumented).  Can we extend the idea
> to fix the to_tsquery case?

Sure, I'll provide a revised patch.

--
Regards,
Alexander Korotkov




Re: Terminate the idle sessions

2021-01-06 Thread Thomas Munro
On Thu, Jan 7, 2021 at 3:03 PM Thomas Munro  wrote:
> On Thu, Jan 7, 2021 at 12:55 PM Tom Lane  wrote:
> > * The SQLSTATE you chose for the new error condition seems pretty
> > random.  I do not see it in the SQL standard, so using a code that's
> > within the spec-reserved code range is certainly wrong.  I went with
> > 08P02 (the "P" makes it outside the reserved range), but I'm not
> > really happy either with using class 08 ("Connection Exception",
> > which seems to be mainly meant for connection-request failures),
> > or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is
> > practically identical but it's not even in the same major class.
> > Now 25 ("Invalid Transaction State") is certainly not right for
> > this new error, but I think what that's telling us is that 25 was a
> > misguided choice for the other error.  In a green field I think I'd
> > put both of them in class 53 ("Insufficient Resources") or maybe class
> > 57 ("Operator Intervention").  Can we get away with renumbering the
> > older error at this point?  In any case I'd be inclined to move the
> > new error to 53 or 57 --- anybody have a preference which?
>
> I don't have a strong view here, but 08 with a P doesn't seem crazy to
> me.  Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout),
> 55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for
> that, distinguished from deadlock by another error field), after these
> timeouts you don't have a session/connection anymore.  The two are a
> bit different though: in the older one, you were in a transaction, and
> it seems to me quite newsworthy that your transaction has been
> aborted, information that is not conveyed quite so clearly with a
> connection-related error class.

Hmm, on further reflection it's still more similar than different and
I'd probably have voted for 08xxx for the older one too if I'd been
paying attention.

One of the strange things about these errors is that they're
asynchronous/unsolicited, but they appear to the client to be the
response to their next request (if it doesn't eat ECONNRESET instead).
That makes me think we should try to make it clear that it's a sort of
lower level thing, and not really a response to your next request at
all.  Perhaps 08 does that.  But it's not obvious...  I see a couple
of DB2 extension SQLSTATEs saying you have no connection: 57015 =
"Connection to the local Db2 not established" and 51006 = "A valid
connection has not been established", and then there's standard 08003
= "connection does not exist" which we're currently using to shout
into the void when the *client* goes away (and also for dblink failure
to find named connection, a pretty unrelated meaning).




Re: plpgsql variable assignment with union is broken

2021-01-06 Thread Merlin Moncure
On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
>
> easter...@verfriemelt.org writes:
> > i found, that the behaviour of variable assignment in combination with 
> > union is not working anymore:
> >   DO $$
> >   DECLARE t bool;
> >   begin
> >   t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;
> >   END $$;
>
> > is this an intended change or is it a bug?
>
> It's an intended change, or at least I considered the case and thought
> that it was useless because assignment will reject any result with more
> than one row.  Do you have any non-toy example that wouldn't be as
> clear or clearer without using UNION?  The above sure seems like an
> example of awful SQL code.

What is the definition of broken here?  What is the behavior of the
query with the change and why?

OP's query provably returns a single row and ought to always assign
true as written.  A real world example might evaluate multiple
condition branches so that the assignment resolves true if any branch
is true. It could be rewritten with 'OR' of course.

Is this also "broken"?
  t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT
'something else' AS a WHERE NOT _Flag;

What about this?
SELECT INTO t true WHERE false
UNION select true;

merlin




Re: A failure of standby to follow timeline switch

2021-01-06 Thread Fujii Masao




On 2021/01/05 17:26, Kyotaro Horiguchi wrote:

At Mon, 4 Jan 2021 19:00:21 +0900, Fujii Masao  
wrote in



On 2021/01/04 12:06, Kyotaro Horiguchi wrote:

At Sat, 26 Dec 2020 02:15:06 +0900, Fujii Masao
 wrote in


On 2020/12/25 12:03, Kyotaro Horiguchi wrote:

The attached is the fixed version.


Thanks for updating the patches!


In the first patch, the test added to 001_stream_rep.pl involves two
copied functions related to server-log investigation from
019_repslot_limit.pl.


So you're planning to define them commonly in TestLib.pm or elsewhere?


Yeah.. That's correct. Newly added as the first patch.

While making that change, I extended the interface of slurp_file to
allow reading from arbitrary position.


Is this extension really helpful for current use case?
At least I'd like to avoid back-patching this since it's an exntesion...

OsFHandleOpen(my $fh = IO::Handle->new(), $fHandle, 'r')
  or croak "could not read \"$filename\": $^E\n";
+   seek($fh, $from, 0)
+ or croak "could not seek \"$filename\" to $from: $^E\n";

I'm not familiar with this area, but SetFilePointer() is more suitable
rather than seek()?



Thanks. The attached is the revised patchset.


Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-06 Thread Zhihong Yu
Hi,
For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither ...'

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
return statement ?

+   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of them
can be dropped ?

+* we don't need to anything for this participant because all
foreign

'need to' -> 'need to do'

+   else if (TransactionIdDidAbort(xid))
+   return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

+   if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be negative
(and whether that should be considered an error).

Cheers

On Wed, Jan 6, 2021 at 5:45 PM Masahiko Sawada 
wrote:

> On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada 
> wrote:
> >
> > On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada 
> wrote:
> > >
> > > Since the previous version conflicts with the current HEAD I've
> > > attached the rebased version patch set.
> >
> > Rebased the patch set again to the current HEAD.
> >
> > The discussion of this patch is very long so here is a short summary
> > of the current state:
> >
> > It’s still under discussion which approaches are the best for the
> > distributed transaction commit as a building block of built-in sharing
> > using foreign data wrappers.
> >
> > Since we’re considering that we use this feature for built-in
> > sharding, the design depends on the architecture of built-in sharding.
> > For example, with the current patch, the PostgreSQL node that received
> > a COMMIT from the client works as a coordinator and it commits the
> > transactions using 2PC on all foreign servers involved with the
> > transaction. This approach would be good with the de-centralized
> > sharding architecture but not with centralized architecture like the
> > GTM node of Postgres-XC and Postgres-XL that is a dedicated component
> > that is responsible for transaction management. Since we don't get a
> > consensus on the built-in sharding architecture yet, it's still an
> > open question that this patch's approach is really good as a building
> > block of the built-in sharding.
> >
> > On the other hand, this feature is not necessarily dedicated to the
> > built-in sharding. For example, the distributed transaction commit
> > through FDW is important also when atomically moving data between two
> > servers via FDWs. Using a dedicated process or server like GTM could
> > be an over solution. Having the node that received a COMMIT work as a
> > coordinator would be better and straight forward.
> >
> > There is no noticeable TODO in the functionality so far covered by
> > this patch set. This patchset adds new FDW APIs to support 2PC,
> > introduces the global transaction manager, and implement those FDW
> > APIs to postgres_fdw. Also, it has regression tests and documentation.
> > Transactions on foreign servers involved with the distributed
> > transaction are committed using 2PC. Committing using 2PC is performed
> > asynchronously and transparently to the user. Therefore, it doesn’t
> > guarantee that transactions on the foreign server are also committed
> > when the client gets an acknowledgment of COMMIT. The patch doesn't
> > cover synchronous foreign transaction commit via 2PC is not covered by
> > this patch as we still need a discussion on the design.
> >
>
> I've attached the rebased patches to make cfbot happy.
>
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>


Re: vacuum_cost_page_miss default value and modern hardware

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 5:39 PM Masahiko Sawada  wrote:
> Perhaps you meant to decrease vacuumm_cost_page_miss instead of
> vacuum_cost_page_dirty?

You're right. Evidently I didn't write this email very carefully.
Sorry about that.

To say it again: I think that a miss (without dirtying the page)
should be cheaper than dirtying a page. This thread began because I
wanted to discuss the relative cost of different kinds of I/O
operations to VACUUM, without necessarily discussing the absolute
costs/time delays.

-- 
Peter Geoghegan




RE: Enhance traceability of wal_level changes for backup management

2021-01-06 Thread tsunakawa.ta...@fujitsu.com
From: osumi.takami...@fujitsu.com 
> I wondered, couldn't backup management tools utilize the information
> in the backup that is needed to be made when wal_level is changed to "none"
> for example ?

IIRC, someone proposed in the original thread that the change count can be 
recorded in pg_control.  The change count is incremented when wal_level is 
changed from replica or higher to minimal or lower.  Maybe you can do it easily 
in XLogReportParameters().

Then, the backup management tool compares the change counts of pg_control in a 
backup and that of the current pg_control.  If the change count is different, 
the tool assumes that the backup cannot be used to recover the database up to 
date.

Ideally, it'd be desirable for PostgreSQL core to have a backup catalog 
management capability like Oracle RMAN.  Then, when the wal_level is changed, 
Postgres may be able to invalidate all backups in the backup catalog.


> As I said before, existing backup management tools support
> only wal_level=replica or logical at present. And, if they would wish to 
> alter the
> status quo and want to cover the changes over wal_levels, I felt it's natural 
> that
> they support feature like taking a full backup, trigged by the wal_level 
> changes
> (or server stop).

In that regard, a feature like Oracle Server Alert would be useful.  When 
important events occur, the database server records them in the alert queue.  
Administration tools read from the alert queue and act accordingly.  wal_level 
change can be recorded in the alert queue, and the backup management tool polls 
the queue and detect the change.


Regards
Takayuki Tsunakawa






Re: Terminate the idle sessions

2021-01-06 Thread Thomas Munro
On Thu, Jan 7, 2021 at 12:55 PM Tom Lane  wrote:
> * Thomas' patch for improving timeout.c seems like a great idea, but
> it did indeed have a race condition, and I felt the comments could do
> with more work.

Oops, and thanks!  Very happy to see this one in the tree.

> * I'm not entirely comfortable with the name "idle_session_timeout",
> because it sounds like it applies to all idle states, but actually
> it only applies when we're not in a transaction.  I left the name
> alone and tried to clarify the docs, but I wonder whether a different
> name would be better.  (Alternatively, we could say that it does
> apply in all cases, making the effective timeout when in a transaction
> the smaller of the two GUCs.  But that'd be more complex to implement
> and less flexible, so I'm not in favor of that answer.)

Hmm, it is a bit confusing, but having them separate is indeed more flexible.

> * The SQLSTATE you chose for the new error condition seems pretty
> random.  I do not see it in the SQL standard, so using a code that's
> within the spec-reserved code range is certainly wrong.  I went with
> 08P02 (the "P" makes it outside the reserved range), but I'm not
> really happy either with using class 08 ("Connection Exception",
> which seems to be mainly meant for connection-request failures),
> or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is
> practically identical but it's not even in the same major class.
> Now 25 ("Invalid Transaction State") is certainly not right for
> this new error, but I think what that's telling us is that 25 was a
> misguided choice for the other error.  In a green field I think I'd
> put both of them in class 53 ("Insufficient Resources") or maybe class
> 57 ("Operator Intervention").  Can we get away with renumbering the
> older error at this point?  In any case I'd be inclined to move the
> new error to 53 or 57 --- anybody have a preference which?

I don't have a strong view here, but 08 with a P doesn't seem crazy to
me.  Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout),
55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for
that, distinguished from deadlock by another error field), after these
timeouts you don't have a session/connection anymore.  The two are a
bit different though: in the older one, you were in a transaction, and
it seems to me quite newsworthy that your transaction has been
aborted, information that is not conveyed quite so clearly with a
connection-related error class.

> * I noted the discussion about dropping setitimer in place of some
> newer kernel API.  I'm not sure that that is worth the trouble in
> isolation, but it might be worth doing if we can switch the whole
> module over to relying on CLOCK_MONOTONIC, so as to make its behavior
> less flaky if the sysadmin steps the system clock.  Portability
> might be a big issue here though, plus we'd have to figure out how
> we want to define enable_timeout_at(), which is unlikely to want to
> use CLOCK_MONOTONIC values.  In any case, that's surely material
> for a whole new thread.

+1




Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-06 Thread Fujii Masao




On 2021/01/07 10:01, Masahiko Sawada wrote:

On Wed, Jan 6, 2021 at 3:37 PM  wrote:



+#define Query_for_list_of_cursors \
+" SELECT name FROM pg_cursors"\

This query should be the following?

" SELECT pg_catalog.quote_ident(name) "\
"   FROM pg_catalog.pg_cursors "\
"  WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"

+/* CLOSE */
+  else if (Matches("CLOSE"))
+  COMPLETE_WITH_QUERY(Query_for_list_of_cursors
+  " UNION ALL SELECT 
'ALL'");

"UNION ALL" should be "UNION"?


Thank you for your advice, and I corrected them.


+   COMPLETE_WITH_QUERY(Query_for_list_of_cursors
+   " UNION SELECT 
'ABSOLUTE'"
+   " UNION SELECT 
'BACKWARD'"
+   " UNION SELECT 
'FORWARD'"
+   " UNION SELECT 
'RELATIVE'"
+   " UNION SELECT 'ALL'"
+   " UNION SELECT 'NEXT'"
+   " UNION SELECT 'PRIOR'"
+   " UNION SELECT 'FIRST'"
+   " UNION SELECT 'LAST'"
+   " UNION SELECT 'FROM'"
+   " UNION SELECT 'IN'");

This change makes psql unexpectedly output "FROM" and "IN" just after "FETCH".


I think "FROM" and "IN" can be placed just after "FETCH". According to
the documentation, the direction can be empty. For instance, we can do
like:


Thank you!


I've attached the patch improving the tab completion for DECLARE as an
example. What do you think?

BTW according to the documentation, the options of DECLARE statement
(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.

DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
CURSOR [ { WITH | WITHOUT } HOLD ] FOR query

But I realized that these options are actually order-insensitive. For
instance, we can declare a cursor like:

=# declare abc scroll binary cursor for select * from pg_class;
DECLARE CURSOR

The both parser code and documentation has been unchanged from 2003.
Is it a documentation bug?


Thank you for your patch, and it is good.
I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are 
order-sensitive."
I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any order." 
, according to the documentation.


Thanks, you're right. I missed that sentence. I still don't think the
syntax of DECLARE statement in the documentation doesn't match its
implementation but I agree that it's order-insensitive.


I made a new patch, but the amount of codes was large due to order-insensitive.


Thank you for updating the patch!

Yeah, I'm also afraid a bit that conditions will exponentially
increase when a new option is added to DECLARE statement in the
future. Looking at the parser code for DECLARE statement, we can put
the same options multiple times (that's also why I don't think it
matches). For instance,

postgres(1:44758)=# begin;
BEGIN
postgres(1:44758)=# declare test binary binary binary cursor for
select * from pg_class;
DECLARE CURSOR

So how about simplify the above code as follows?

@@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end)
 else if (Matches("DECLARE", MatchAny))
 COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
   "CURSOR");
+   /*
+* Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
+* NO SCROLL, and CURSOR. The tail doesn't match any keywords for
+* DECLARE, assume we want options.
+*/
+   else if (HeadMatches("DECLARE", MatchAny, "*") &&
+TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
+   COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
+ "CURSOR");


This change seems to cause "DECLARE ... CURSOR FOR SELECT " to
unexpectedly output BINARY, INSENSITIVE, etc.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: When (and whether) should we improve the chapter on parallel query to accommodate parallel data updates?

2021-01-06 Thread tsunakawa.ta...@fujitsu.com
From: Amit Kapila 
> I think each feature should develop the docs as part of feature
> development but if we want to see some additional work like improving
> overall docs for parallel execution as you are suggesting then it can
> be done separately as well.
> 
> I think you have a valid point but maybe it would be better to do the
> improvements you are suggesting once the parallel inserts related work
> is committed.
OK.  I think at least the words query, statement, execution, and/or operation 
should be used appropriately before RC (ideally, before beta).  Maybe parallel 
copy should also be put in the chapter.  Anyway, I agree that each developer 
focuses on their features for the moment (parallel insert select, parallel 
copy, parallel CTAS, etc), and then make the documentation consistent and 
well-organized after they've settled.  (I just felt a bit uneasy to say "this 
patch looks good" while leaving the document consistency.)


Regards
Takayuki Tsunakawa




RE: Single transaction in the tablesync worker?

2021-01-06 Thread Hou, Zhijie
> PSA the v11 patch for the Tablesync Solution1.
> 
> Difference from v10:
> - Addresses several recent review comments.
> - pg_indent has been run
> 
Hi

I took a look into the patch and have some comments.

1.
  *   So the state progression is always: INIT -> DATASYNC -> SYNCWAIT ->
- *   CATCHUP -> SYNCDONE -> READY.
+ *   CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY.

I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE,
But It seems the SUBREL_STATE_TCOPYDONE is actually set before 
SUBREL_STATE_SYNCWAIT[1].
Did i miss something here ?

[1]-
+   UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
+  
MyLogicalRepWorker->relid,
+  
SUBREL_STATE_TCOPYDONE,
+  
MyLogicalRepWorker->relstate_lsn);
...
/*
 * We are done with the initial data synchronization, update the state.
 */
SpinLockAcquire(>relmutex);
MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
--


2.
i = initialize,
d = data is being copied,
+   C = table data has been copied,
s = synchronized,
r = ready (normal replication)

+#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed
+* 
(sublsn NULL) */
The character representing 'data has been copied' in the catalog seems 
different from the macro define.


Best regards,
houzj




Re: vacuum_cost_page_miss default value and modern hardware

2021-01-06 Thread Masahiko Sawada
On Mon, Dec 28, 2020 at 5:17 AM Peter Geoghegan  wrote:
>
> Simply decreasing vacuum_cost_page_dirty seems like a low risk way of
> making the VACUUM costing more useful within autovacuum workers.
> Halving vacuum_cost_page_dirty to 5 would be a good start, though I
> think that a value as low as 2 would be better. That would make it
> only 2x vacuum_cost_page_hit's default (i.e 2x the cost of processing
> a page that is in shared_buffers but did not need to be dirtied),
> which seems sensible to me when considered in the context in which the
> value is actually applied (and not some abstract theoretical context).

Perhaps you meant to decrease vacuumm_cost_page_miss instead of
vacuum_cost_page_dirty?

>
> There are a few reasons why this seems like a good idea now:
>
> * Throttling/delaying VACUUM is only useful as a way of smoothing the
> impact on production queries, which is an important goal, but
> currently we don't discriminate against the cost that we really should
> keep under control (dirtying new pages within VACUUM) very well.
>
> This is due to the aforementioned trends, the use of a strategy ring
> buffer by VACUUM, the fact that indexes are usually vacuumed in
> sequential physical order these days, and many other things that were
> not a factor in 2004.
>
> * There is a real downside to throttling VACUUM unnecessarily, and the
> effects are *non-linear*. On a large table, the oldest xmin cutoff may
> become very old by the time we're only (say) half way through the
> initial table scan in lazy_scan_heap(). There may be relatively little
> work to do because most of the dead tuples won't be before the oldest
> xmin cutoff by that time (VACUUM just cannot keep up). Excessive
> throttling for simple page misses may actually *increase* the amount
> of I/O that VACUUM has to do over time -- we should try to get to the
> pages that actually need to be vacuumed quickly, which are probably
> already dirty anyway (and thus are probably going to add little to the
> cost delay limit in practice). Everything is connected to everything
> else.
>
> * vacuum_cost_page_miss is very much not like random_page_cost, and
> the similar names confuse the issue -- this is not an optimization
> problem. Thinking about VACUUM as unrelated to the workload itself is
> obviously wrong. Changing the default is also an opportunity to clear
> that up.
>
> Even if I am wrong to suggest that a miss within VACUUM should only be
> thought of as 2x as expensive as a hit in some *general* sense, I am
> concerned about *specific* consequences. There is no question about
> picking the best access path here -- we're still going to have to
> access the same blocks in the same way sooner or later. In general I
> think that we should move in the direction of more frequent, cheaper
> VACUUM operations [1], though we've already done a lot of work in that
> direction (e.g. freeze map work).

I agree with that direction.

>
> * Some impact from VACUUM on query performance may in fact be a good thing.
>
> Deferring the cost of vacuuming can only make sense if we'll
> eventually be able to catch up because we're experiencing a surge in
> demand, which seems kind of optimistic -- it seems more likely that
> the GC debt will just grow and grow. Why should the DBA not expect to
> experience some kind of impact, which could be viewed as a natural
> kind of backpressure? The most important thing is consistent
> performance.
>
> * Other recent work such as the vacuum_cleanup_index_scale_factor
> patch has increased the relative cost of index vacuuming in some
> important cases: we don't have a visibility/freeze map for indexes,
> but index vacuuming that doesn't dirty any pages and has a TID kill
> list that's concentrated at the end of the heap/table is pretty cheap
> (the TID binary search is cache efficient/cheap). This change will
> help these workloads by better reflecting the way in which index
> vacuuming can be cheap for append-only tables with a small amount of
> garbage for recently inserted tuples that also got updated/deleted.
>
> * Lowering vacuum_cost_page_miss's default (as opposed to changing
> something else) is a simple and less disruptive way of achieving these
> goals.
>
> This approach seems unlikely to break existing VACUUM-related custom
> settings from current versions that get reused on upgrade. I expect
> little impact on small installations.
>

I recalled the discussion decreasing the default value for
autovacuum_cost_delay from 20ms to 2ms on PostgreSQL 12. I re-read
through the discussion but there wasn't the discussion changing
hit/miss/dirty.

Whereas the change we did for autovacuum_cost_delay affects every
installation, lowering vacuum_cost_page_miss would bring a different
impact depending on workload and database size etc. For example, the
user would have a larger I/O spike in a case where the database
doesn’t fit in the server's RAM and doing vacuuming cold
tables/indexes, for example, when anti-wraparound 

Re: [PATCH] Simple progress reporting for COPY command

2021-01-06 Thread Justin Pryzby
On Wed, Jan 06, 2021 at 10:44:49PM +0100, Tomas Vondra wrote:
> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > I'm attaching the whole patch since commitfest failed to ingest the
> > last incremental on CI.
> > 
> 
> Yeah, the whole patch needs to be attached for the commitfest tester to work
> correctly - it can't apply pieces from multiple messages, etc.
> 
> Anyway, I pushed this last version of patch, after a couple more tweaks,
> mainly to the docs - one place used pg_stat_copy_progress, the section was
> not indexed properly, and so on.

Some more doc fixes.

>From 0616f448057905ab9b3218ebddfdd3af52e62bac Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 6 Jan 2021 19:08:11 -0600
Subject: [PATCH] doc review: COPY progress:
 8a4f618e7ae3cb11b0b37d0f06f05c8ff905833f

---
 doc/src/sgml/monitoring.sgml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43fe8ae383..3cdb1aff3c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6414,9 +6414,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
Whenever COPY is running, the
pg_stat_progress_copy view will contain one row
-   for each backend that is currently running COPY command.
-   The table bellow describes the information that will be reported and provide
-   information how to interpret it.
+   for each backend that is currently running a COPY 
command.
+   The table below describes the information that will be reported and provides
+   information about how to interpret it.
   
 
   
@@ -6445,7 +6445,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   datid text
+   datid oid
   
   
OID of the database to which this backend is connected.
@@ -6467,7 +6467,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
   
OID of the table on which the COPY command is 
executed.
-   It is set to 0 if SELECT query is provided.
+   It is set to 0 if copying from a SELECT query.
   
  
 
-- 
2.17.0

>From 0616f448057905ab9b3218ebddfdd3af52e62bac Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 6 Jan 2021 19:08:11 -0600
Subject: [PATCH] doc review: COPY progress:
 8a4f618e7ae3cb11b0b37d0f06f05c8ff905833f

---
 doc/src/sgml/monitoring.sgml | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43fe8ae383..3cdb1aff3c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6414,9 +6414,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
Whenever COPY is running, the
pg_stat_progress_copy view will contain one row
-   for each backend that is currently running COPY command.
-   The table bellow describes the information that will be reported and provide
-   information how to interpret it.
+   for each backend that is currently running a COPY command.
+   The table below describes the information that will be reported and provides
+   information about how to interpret it.
   
 
   
@@ -6445,7 +6445,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   datid text
+   datid oid
   
   
OID of the database to which this backend is connected.
@@ -6467,7 +6467,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
   
OID of the table on which the COPY command is executed.
-   It is set to 0 if SELECT query is provided.
+   It is set to 0 if copying from a SELECT query.
   
  
 
-- 
2.17.0



Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion

2021-01-06 Thread Masahiko Sawada
On Wed, Jan 6, 2021 at 3:37 PM  wrote:
>
> >+#define Query_for_list_of_cursors \
> >+" SELECT name FROM pg_cursors"\
> >
> >This query should be the following?
> >
> >" SELECT pg_catalog.quote_ident(name) "\
> >"   FROM pg_catalog.pg_cursors "\
> >"  WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'"
> >
> >+/* CLOSE */
> >+  else if (Matches("CLOSE"))
> >+  COMPLETE_WITH_QUERY(Query_for_list_of_cursors
> >+  " UNION ALL SELECT 
> >'ALL'");
> >
> >"UNION ALL" should be "UNION"?
>
> Thank you for your advice, and I corrected them.
>
> >> +   COMPLETE_WITH_QUERY(Query_for_list_of_cursors
> >> +   " UNION SELECT 
> >> 'ABSOLUTE'"
> >> +   " UNION SELECT 
> >> 'BACKWARD'"
> >> +   " UNION SELECT 
> >> 'FORWARD'"
> >> +   " UNION SELECT 
> >> 'RELATIVE'"
> >> +   " UNION SELECT 
> >> 'ALL'"
> >> +   " UNION SELECT 
> >> 'NEXT'"
> >> +   " UNION SELECT 
> >> 'PRIOR'"
> >> +   " UNION SELECT 
> >> 'FIRST'"
> >> +   " UNION SELECT 
> >> 'LAST'"
> >> +   " UNION SELECT 
> >> 'FROM'"
> >> +   " UNION SELECT 
> >> 'IN'");
> >>
> >> This change makes psql unexpectedly output "FROM" and "IN" just after 
> >> "FETCH".
> >
> >I think "FROM" and "IN" can be placed just after "FETCH". According to
> >the documentation, the direction can be empty. For instance, we can do
> >like:
>
> Thank you!
>
> >I've attached the patch improving the tab completion for DECLARE as an
> >example. What do you think?
> >
> >BTW according to the documentation, the options of DECLARE statement
> >(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) are order-sensitive.
> >
> >DECLARE name [ BINARY ] [ INSENSITIVE ] [ [ NO ] SCROLL ]
> >CURSOR [ { WITH | WITHOUT } HOLD ] FOR query
> >
> >But I realized that these options are actually order-insensitive. For
> >instance, we can declare a cursor like:
> >
> >=# declare abc scroll binary cursor for select * from pg_class;
> >DECLARE CURSOR
> >
> >The both parser code and documentation has been unchanged from 2003.
> >Is it a documentation bug?
>
> Thank you for your patch, and it is good.
> I cannot find the description "(BINARY, INSENSITIVE, SCROLL, and NO SCROLL) 
> are order-sensitive."
> I saw "The key words BINARY, INSENSITIVE, and SCROLL can appear in any 
> order." , according to the documentation.

Thanks, you're right. I missed that sentence. I still don't think the
syntax of DECLARE statement in the documentation doesn't match its
implementation but I agree that it's order-insensitive.

> I made a new patch, but the amount of codes was large due to 
> order-insensitive.

Thank you for updating the patch!

Yeah, I'm also afraid a bit that conditions will exponentially
increase when a new option is added to DECLARE statement in the
future. Looking at the parser code for DECLARE statement, we can put
the same options multiple times (that's also why I don't think it
matches). For instance,

postgres(1:44758)=# begin;
BEGIN
postgres(1:44758)=# declare test binary binary binary cursor for
select * from pg_class;
DECLARE CURSOR

So how about simplify the above code as follows?

@@ -3005,8 +3014,23 @@ psql_completion(const char *text, int start, int end)
else if (Matches("DECLARE", MatchAny))
COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
  "CURSOR");
+   /*
+* Complete DECLARE  with one of BINARY, INSENSITIVE, SCROLL,
+* NO SCROLL, and CURSOR. The tail doesn't match any keywords for
+* DECLARE, assume we want options.
+*/
+   else if (HeadMatches("DECLARE", MatchAny, "*") &&
+TailMatches(MatchAnyExcept("CURSOR|WITH|WITHOUT|HOLD|FOR")))
+   COMPLETE_WITH("BINARY", "INSENSITIVE", "SCROLL", "NO SCROLL",
+ "CURSOR");
+   /*
+* Complete DECLARE  ... CURSOR with one of WITH HOLD, WITHOUT HOLD,
+* and FOR.
+*/
else if (HeadMatches("DECLARE") && TailMatches("CURSOR"))
COMPLETE_WITH("WITH HOLD", "WITHOUT HOLD", "FOR");
+   else if (HeadMatches("DECLARE") && TailMatches("HOLD"))
+   COMPLETE_WITH("FOR");

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: list of extended statistics on psql

2021-01-06 Thread Tomas Vondra




On 1/7/21 1:46 AM, Tatsuro Yamada wrote:

Hi Tomas,


Thanks, and Happy new year to you too.

I almost pushed this, but I have one more question. listExtendedStats 
first checks the server version, and errors out for pre-10 servers. 
Shouldn't the logic building query check the server version too, to 
decide whether to check the MCV stats? Otherwise it won't work on 10 
and 11, which does not support the "mcv" stats.
I don't recall what exactly is our policy regarding new psql with 
older server versions, but it seems strange to check for 10.0 and 
then fail anyway for "supported" versions.


Thanks for your comments.

I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.



+1

BTW perhaps a quick look at the other \d commands would show if there 
are precedents. I didn't have time for that.



I wonder the column names added by \dX+ is fine? For example,
"Ndistinct_size" and "Dependencies_size". It looks like long names,
but acceptable?



Seems acceptable - I don't have a better idea.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: list of extended statistics on psql

2021-01-06 Thread Tatsuro Yamada

Hi Tomas,


Thanks, and Happy new year to you too.

I almost pushed this, but I have one more question. listExtendedStats first checks the 
server version, and errors out for pre-10 servers. Shouldn't the logic building query 
check the server version too, to decide whether to check the MCV stats? Otherwise it 
won't work on 10 and 11, which does not support the "mcv" stats.

I don't recall what exactly is our policy regarding new psql with older server versions, 
but it seems strange to check for 10.0 and then fail anyway for "supported" 
versions.


Thanks for your comments.

I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.

I wonder the column names added by \dX+ is fine? For example,
"Ndistinct_size" and "Dependencies_size". It looks like long names,
but acceptable?

Regards,
Tatsuro Yamada






Re: logical replication worker accesses catalogs in error context callback

2021-01-06 Thread Masahiko Sawada
On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 6, 2021 at 11:02 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > Due to a debug ereport I just noticed that worker.c's
> > slot_store_error_callback is doing something quite dangerous:
> >
> > static void
> > slot_store_error_callback(void *arg)
> > {
> > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
> > LogicalRepRelMapEntry *rel;
> > char   *remotetypname;
> > Oid remotetypoid,
> > localtypoid;
> >
> > /* Nothing to do if remote attribute number is not set */
> > if (errarg->remote_attnum < 0)
> > return;
> >
> > rel = errarg->rel;
> > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
> >
> > /* Fetch remote type name from the LogicalRepTypMap cache */
> > remotetypname = logicalrep_typmap_gettypname(remotetypoid);
> >
> > /* Fetch local type OID from the local sys cache */
> > localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 
> > 1);
> >
> > errcontext("processing remote data for replication target relation 
> > \"%s.%s\" column \"%s\", "
> >"remote type %s, local type %s",
> >rel->remoterel.nspname, rel->remoterel.relname,
> >rel->remoterel.attnames[errarg->remote_attnum],
> >remotetypname,
> >format_type_be(localtypoid));
> > }
> >
> >
> > that's not code that can run in an error context callback. It's
> > theoretically possible (but unlikely) that
> > logicalrep_typmap_gettypname() is safe to run in an error context
> > callback. But get_atttype() definitely isn't.
> >
> > get_attype() may do catalog accesses. That definitely can't happen
> > inside an error context callback - the entire transaction might be
> > borked at this point!
>
> You're right. Perhaps calling to format_type_be() is also dangerous
> since it does catalog access. We should have added the local type
> names to SlotErrCallbackArg so we avoid catalog access in the error
> context.
>
> I'll try to fix this.

Attached the patch that fixes this issue.

Since logicalrep_typmap_gettypname() could search the sys cache by
calling to format_type_be(), I stored both local and remote type names
to SlotErrCallbackArg so that we can just set the names in the error
callback without sys cache lookup.

Please review it.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


fix_slot_store_error_callback.patch
Description: Binary data


Re: Terminate the idle sessions

2021-01-06 Thread Tom Lane
Li Japin  writes:
> [ v9-0001-Allow-terminating-the-idle-sessions.patch ]

I've reviewed and pushed this.  A few notes:

* Thomas' patch for improving timeout.c seems like a great idea, but
it did indeed have a race condition, and I felt the comments could do
with more work.

* I'm not entirely comfortable with the name "idle_session_timeout",
because it sounds like it applies to all idle states, but actually
it only applies when we're not in a transaction.  I left the name
alone and tried to clarify the docs, but I wonder whether a different
name would be better.  (Alternatively, we could say that it does
apply in all cases, making the effective timeout when in a transaction
the smaller of the two GUCs.  But that'd be more complex to implement
and less flexible, so I'm not in favor of that answer.)

* The SQLSTATE you chose for the new error condition seems pretty
random.  I do not see it in the SQL standard, so using a code that's
within the spec-reserved code range is certainly wrong.  I went with
08P02 (the "P" makes it outside the reserved range), but I'm not
really happy either with using class 08 ("Connection Exception",
which seems to be mainly meant for connection-request failures),
or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is
practically identical but it's not even in the same major class.
Now 25 ("Invalid Transaction State") is certainly not right for
this new error, but I think what that's telling us is that 25 was a
misguided choice for the other error.  In a green field I think I'd
put both of them in class 53 ("Insufficient Resources") or maybe class
57 ("Operator Intervention").  Can we get away with renumbering the
older error at this point?  In any case I'd be inclined to move the
new error to 53 or 57 --- anybody have a preference which?

* I think the original intent in timeout.h was to have 10 slots
available for run-time-defined timeout reasons.  This is the third
predefined one we've added since the header was created, so it's
starting to look a little tight.  I adjusted the code to hopefully
preserve 10 free slots going forward.

* I noted the discussion about dropping setitimer in place of some
newer kernel API.  I'm not sure that that is worth the trouble in
isolation, but it might be worth doing if we can switch the whole
module over to relying on CLOCK_MONOTONIC, so as to make its behavior
less flaky if the sysadmin steps the system clock.  Portability
might be a big issue here though, plus we'd have to figure out how
we want to define enable_timeout_at(), which is unlikely to want to
use CLOCK_MONOTONIC values.  In any case, that's surely material
for a whole new thread.

regards, tom lane




Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Zhihong Yu
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :

workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.

It would be nice if the scenarios where parallel plan is not chosen are
listed.

+   if ((root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_SELECT_QUERY) &&
+   (root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_CAN_IGN_TUP_COST))
+   {
+   /* We are ignoring the parallel tuple cost, so mark it. */
+   root->parse->parallelInsCmdTupleCostOpt |=
+
PARALLEL_INSERT_TUP_COST_IGNORED;

If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY
and PARALLEL_INSERT_CAN_IGN_TUP_COST are
set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.

Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting
(1) of the first two bits should suffice.

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
>  wrote:
> >
> > > +   if
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > > why would rel and relname not be there? if no rows have been inserted?
> > > because it seems from the intorel_startup function that that would be
> > > set as soon as startup was done, which i assume (wrongly?) is always
> done?
> >
> > Actually, that into clause rel variable is always being set in the
> gram.y for CTAS, Create Materialized View and SELECT INTO (because
> qualified_name non-terminal is not optional). My bad. I just added it as a
> sanity check. Actually, it's not required.
> >
> > create_as_target:
> > qualified_name opt_column_list table_access_method_clause
> > OptWith OnCommitOption OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > create_mv_target:
> > qualified_name opt_column_list table_access_method_clause
> opt_reloptions OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > into_clause:
> > INTO OptTempTableName
> > {
> > $$ = makeNode(IntoClause);
> >$$->rel = $2;
> >
> > I will change the below code:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> > +((DR_intorel *) gstate->dest)->into &&
> > +((DR_intorel *) gstate->dest)->into->rel &&
> > +((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > +{
> >
> > to:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +{
> >
> > I will update this in the next version of the patch set.
>
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: list of extended statistics on psql

2021-01-06 Thread Tomas Vondra

On 1/5/21 5:26 AM, Tatsuro Yamada wrote:

Hi,


I rebased the patch set on the master (7e5e1bba03), and the regression
test is good. Therefore, I changed the status of the patch: "needs 
review". 


Happy New Year!

I rebased my patches on HEAD.
Please find attached files. :-D



Thanks, and Happy new year to you too.

I almost pushed this, but I have one more question. listExtendedStats 
first checks the server version, and errors out for pre-10 servers. 
Shouldn't the logic building query check the server version too, to 
decide whether to check the MCV stats? Otherwise it won't work on 10 and 
11, which does not support the "mcv" stats.


I don't recall what exactly is our policy regarding new psql with older 
server versions, but it seems strange to check for 10.0 and then fail 
anyway for "supported" versions.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 1:29 PM Michael Banck  wrote:
> That one seems to be 5min everywhere, and one can change it except on
> Azure.

Okay, thanks for clearing that up. Looks like all of the big 3 cloud
providers use Postgres checksums in a straightforward way.

I don't have much more to say on this thread. I am -1 on the current
proposal to enable page-level checksums by default. I may change my
mind on this in the future, perhaps when underlying architectural
details change, but right now this seems like a net negative.

-- 
Peter Geoghegan




Re: [PATCH] Simple progress reporting for COPY command

2021-01-06 Thread Tomas Vondra

On 1/5/21 11:02 AM, Josef Šimánek wrote:

I'm attaching the whole patch since commitfest failed to ingest the
last incremental on CI.



Yeah, the whole patch needs to be attached for the commitfest tester to 
work correctly - it can't apply pieces from multiple messages, etc.


Anyway, I pushed this last version of patch, after a couple more tweaks, 
mainly to the docs - one place used pg_stat_copy_progress, the section 
was not indexed properly, and so on.


I see Matthias proposed to change "lines" to "tuples" - I only saw the 
message after pushing, but I probably wouldn't make that change anyway. 
The CSV docs seem to talk about lines, newlines etc. so it seems fine. 
If not, we can change that.


One more question, though - I now realize the lines_processed ignores 
rows skipped because of BEFORE INSERT triggers. I wonder if that's the 
right thing to do? Imagine you know the number of lines in a file. You 
can't really use (lines_processed / total_lines) to measure progress, 
because that may ignore many "skipped" rows. So maybe this should be 
changed to count all rows. OTOH we still have bytes_processed.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Michael Banck
Hi,

Am Mittwoch, den 06.01.2021, 13:19 -0800 schrieb Peter Geoghegan:
> On Wed, Jan 6, 2021 at 1:08 PM Peter Geoghegan  wrote:
> > So you tested this using "show data_checksums;" in psql in each case?
> > 
> > What does "show full_page_writes;" show you?
> 
> Another consideration is checkpoint_timeout 

That one seems to be 5min everywhere, and one can change it except on
Azure.

> and max_wal_size.

I think this one is usually based on instance size, but I only have
access to two differently sized instance on Azure right now, where it is
1GB for the smallest possible one and 32GB for a production instance.
It's 1GB for the small Google Cloud SQL Postgres and 2GB for the small
RDS test instance. It is user-changeable everywhere (at least to some
degree).


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 1:08 PM Peter Geoghegan  wrote:
> So you tested this using "show data_checksums;" in psql in each case?
>
> What does "show full_page_writes;" show you?

Another consideration is checkpoint_timeout and max_wal_size.

-- 
Peter Geoghegan




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Michael Banck
Am Mittwoch, den 06.01.2021, 13:08 -0800 schrieb Peter Geoghegan:
> On Wed, Jan 6, 2021 at 1:04 PM Michael Banck  
> wrote:
> > At least data_checksums=on for Azure Managed Postgres, Amazon RDS and
> > Google Cloud SQL Postgres. It might not be on for all types (I just
> > checked the basic one each earlier today), but I guess it is.
> 
> So you tested this using "show data_checksums;" in psql in each case?

Yes.

> What does "show full_page_writes;" show you?

It's also 'on' for all three (wal_log_hints is 'off' everywhere).


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 1:04 PM Michael Banck  wrote:
> At least data_checksums=on for Azure Managed Postgres, Amazon RDS and
> Google Cloud SQL Postgres. It might not be on for all types (I just
> checked the basic one each earlier today), but I guess it is.

So you tested this using "show data_checksums;" in psql in each case?

What does "show full_page_writes;" show you?

-- 
Peter Geoghegan




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Michael Banck
Hi,

Am Mittwoch, den 06.01.2021, 12:56 -0800 schrieb Peter Geoghegan:
> On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost  wrote:
> > As already mentioned, it's also, at least today, far
> > simpler to disable checksums than to enable them, which is something
> > else to consider when thinking about what the default should be.
> 
> That is a valid concern. I just don't think that it's good enough on
> its own, given the overwhelming downside of enabling checksums given
> the WAL architecture that we have today.
> 
> > That the major cloud providers all have checksums enabled (at least by
> > default, though I wonder if they would even let you turn them off..),

I don't think so, and it would be very weird if they did, not just due
to the fact that shutting down the instance and running pg_checksums is
only possible since v13 (and only Google Cloud SQL Postgres supports
that so far), but also because this is the kind of decisions cloud
providers tend to take for their clients and not allow the users any say
in (just like how they do backups or failovers).

> > even when we don't have them on by default, strikes me as pretty telling
> > that this is something that we should have on by default.
> 
> Please provide supporting evidence. I know that EBS itself uses
> checksums at the block device level, so I'm sure that RDS "uses
> checksums" in some sense. But does RDS use --data-checksums during
> initdb?

At least data_checksums=on for Azure Managed Postgres, Amazon RDS and
Google Cloud SQL Postgres. It might not be on for all types (I just
checked the basic one each earlier today), but I guess it is.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 12:30 PM Stephen Frost  wrote:
> As already mentioned, it's also, at least today, far
> simpler to disable checksums than to enable them, which is something
> else to consider when thinking about what the default should be.

That is a valid concern. I just don't think that it's good enough on
its own, given the overwhelming downside of enabling checksums given
the WAL architecture that we have today.

> That the major cloud providers all have checksums enabled (at least by
> default, though I wonder if they would even let you turn them off..),
> even when we don't have them on by default, strikes me as pretty telling
> that this is something that we should have on by default.

Please provide supporting evidence. I know that EBS itself uses
checksums at the block device level, so I'm sure that RDS "uses
checksums" in some sense. But does RDS use --data-checksums during
initdb?

> Certainly there's a different risk profile between the two and there may
> be times when someone is fine with running without fsync, or fine
> running without checksums, but those are, in my view, exceptions made
> once you understand exactly what risk you're willing to accept, and not
> what the default or typical deployment should be.

Okay, I'll bite. Here is the important difference: Enabling checksums
doesn't actually make data corruption less likely, it just makes it
easier to detect. Whereas disabling fsync will reliably produce
corruption before too long in almost any installation. It may
occasionally be appropriate to disable fsync in a very controlled
environment, but it's rare, and not much faster than disabling
synchronous commits in any case. It barely ever happens.

We added page-level checksums in 9.3. Can you imagine a counterfactual
history in which Postgres had page checksums since the 1990s, but only
added the fsync feature in 9.3? Please answer this non-rhetorical
question.

-- 
Peter Geoghegan




Re: psql \df choose functions by their arguments

2021-01-06 Thread Greg Sabino Mullane
On Sat, Jan 2, 2021 at 1:56 AM Thomas Munro  wrote:

> ...
> It looks like there is a collation dependency here that causes the
> test to fail on some systems:
>

Thanks for pointing that out. I tweaked the function definitions to
hopefully sidestep the ordering issue - attached is v4.

Cheers,
Greg


v4-psql-df-pick-function-by-type.patch
Description: Binary data


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Jan 6, 2021 at 12:03 PM Stephen Frost  wrote:
> > Do you really believe it to be wrong?  Do we stop performing the correct
> > write calls in the correct order to the kernel with fsync being off?  If
> > the kernel actually handles all of our write calls correctly and we
> > cleanly shut down and the kernel cleanly shuts down and sync's the disks
> > before a reboot, will there be corruption from running with fsync off?
> 
> This is a total straw man. Everyone understands the technical issues
> with fsync perfectly well, and everyone understands that everyone
> understands the issue, so spare me the "I'm just a humble country
> lawyer" style explanation.
> 
> What you seem to be arguing is that the differences between disabling
> checksums and disabling fsync is basically quantitative, and so making
> a qualitative distinction between those two things is meaningless, and
> that it logically follows that disagreeing with you is essentially
> irresponsible. This is a tactic that would be an embarrassment to a
> high school debate team. It's below you.

I can agree that there's a usefulness in making a qualitative
distinction between them, but we're talking about a default here, not
about if we should even have these options or these capabilities or if
we should force them upon everyone or if one is somehow better or worse
than the other.  As already mentioned, it's also, at least today, far
simpler to disable checksums than to enable them, which is something
else to consider when thinking about what the default should be.

That the major cloud providers all have checksums enabled (at least by
default, though I wonder if they would even let you turn them off..),
even when we don't have them on by default, strikes me as pretty telling
that this is something that we should have on by default.

Certainly there's a different risk profile between the two and there may
be times when someone is fine with running without fsync, or fine
running without checksums, but those are, in my view, exceptions made
once you understand exactly what risk you're willing to accept, and not
what the default or typical deployment should be.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-06 Thread Pavel Stehule
Hi

út 5. 1. 2021 v 20:32 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Mon, Jan 04, 2021 at 06:56:17PM +0100, Pavel Stehule wrote:
> > po 4. 1. 2021 v 14:58 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
> > napsal:
> > postgres=# update foo set a['c']['c'][10] = '10';
> > postgres=# update foo set a['c'][10][10] = '10';
>
> Yeah, there was one clumsy memory allocation. On the way I've found and
> fixed another issue with jsonb generation, right now I don't see any
> other problems. But as my imagination, despite all the sci-fi I've read
> this year, is apparently not so versatile, I'll rely on yours, could you
> please check this version again?
>

this case should to raise exception - the value should be changed or error
should be raised

postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a'] = '100';
UPDATE 1
postgres=# select * from foo;
┌┐
│ a  │
╞╡
│ {"a": 100} │
└┘
(1 row)

postgres=# update foo set a['a'][1] = '-1';
UPDATE 1
postgres=# select * from foo;
┌┐
│ a  │
╞╡
│ {"a": 100} │
└┘
(1 row)

Regards

Pavel


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 12:03 PM Stephen Frost  wrote:
> Do you really believe it to be wrong?  Do we stop performing the correct
> write calls in the correct order to the kernel with fsync being off?  If
> the kernel actually handles all of our write calls correctly and we
> cleanly shut down and the kernel cleanly shuts down and sync's the disks
> before a reboot, will there be corruption from running with fsync off?

This is a total straw man. Everyone understands the technical issues
with fsync perfectly well, and everyone understands that everyone
understands the issue, so spare me the "I'm just a humble country
lawyer" style explanation.

What you seem to be arguing is that the differences between disabling
checksums and disabling fsync is basically quantitative, and so making
a qualitative distinction between those two things is meaningless, and
that it logically follows that disagreeing with you is essentially
irresponsible. This is a tactic that would be an embarrassment to a
high school debate team. It's below you.

Your argument may be logically consistent, but it's still nonsense.

-- 
Peter Geoghegan




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Jan 6, 2021 at 11:44 AM Stephen Frost  wrote:
> > Having fsync off won't actually cause corruption unless you have an OS
> > crash or don't sync the disks when you reboot the system though- so it's
> > a hedge against certain failure conditions, as is checksums.
> 
> I find this argument baffling. Do you really believe this?

Do you really believe it to be wrong?  Do we stop performing the correct
write calls in the correct order to the kernel with fsync being off?  If
the kernel actually handles all of our write calls correctly and we
cleanly shut down and the kernel cleanly shuts down and sync's the disks
before a reboot, will there be corruption from running with fsync off?

If that's the case, I'd certainly be curious to hear under what
conditions, when everything works, we'll end up with corruption simply
from running with fsync off.

I don't mean to imply that I advocate for such- I'd hope that it would
be clear from this discussion that I'm not suggesting that we turn fsync
off, and rather the opposite, that we have both fsync and data checksums
be on by default, but to claim that having fsync off will always, in
every situation, cause corruption is over-stating the case.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 11:44 AM Stephen Frost  wrote:
> Having fsync off won't actually cause corruption unless you have an OS
> crash or don't sync the disks when you reboot the system though- so it's
> a hedge against certain failure conditions, as is checksums.

I find this argument baffling. Do you really believe this?

-- 
Peter Geoghegan




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2021-01-06 13:01:59 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > > imv, enabling page checksums is akin to having fsync enabled by default.
> > > > Does it impact performance?  Yes, surely quite a lot, but it's also the
> > > > safe and sane choice when it comes to defaults.
> > > 
> > > Oh for crying out loud.
> > 
> > Not sure what you're hoping to gain from such comments, but it doesn't
> > do anything to change my opinion.
> 
> It seems so facetious to compare fsync=off (will cause corruption) with
> data_checksums=off (will not cause corruption) that I find the
> comparison to be insulting.

Having fsync off won't actually cause corruption unless you have an OS
crash or don't sync the disks when you reboot the system though- so it's
a hedge against certain failure conditions, as is checksums.  Yes,
having fsync off on a system and then rebooting it (ungracefully..) will
likely cause corruption and, no, having data checksums turned off won't
cause corruption in that way or at all in its own right- but there's a
decent chance that if there does end up being latent corruption that
it'll at least be detected, which is why so many (including, apparently,
the popular cloud providers) enable it and why we should have it on by
default.

I don't agree that they are so different as you make them out to be.  I
do appreciate that the chances of a random reboot happening are higher
than the chance of a disk failure being detected by a PG checksum (and
not something else first).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Peter Geoghegan
On Wed, Jan 6, 2021 at 9:55 AM Andres Freund  wrote:
> Vacuum performance is one of *THE* major complaints about
> postgres. Making it run slower by a lot obviously exascerbates that
> problem significantly. I think it'd be prohibitively expensive if it
> were 1.5x, not to even speak of 15x.

+1. I am *strongly* opposed to enabling checksums by default for this
reason. I think that it's a total non-starter, unless and until the
overhead can be dramatically reduced. The fact that it isn't the fault
of the checksum mechanism in some abstract sense is 100% irrelevant.

-- 
Peter Geoghegan




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Andres Freund
Hi,

On 2021-01-06 13:01:59 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > > imv, enabling page checksums is akin to having fsync enabled by default.
> > > Does it impact performance?  Yes, surely quite a lot, but it's also the
> > > safe and sane choice when it comes to defaults.
> > 
> > Oh for crying out loud.
> 
> Not sure what you're hoping to gain from such comments, but it doesn't
> do anything to change my opinion.

It seems so facetious to compare fsync=off (will cause corruption) with
data_checksums=off (will not cause corruption) that I find the
comparison to be insulting.

Greetings,

Andres Freund




Re: set_config() documentation clarification

2021-01-06 Thread Pavel Stehule
>
>
>
>> SET
>>   g = year % 19,
>>   c = year / 100,
>>   h = (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30,
>>   i = h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)),
>>   j = year + year/4 + i + 2 - c + c/4) % 7,
>>   p = i - j,
>>   easter_month = 3 + (p + 26)/30,
>>   easter_day = 1 + (p + 27 + (p + 6)/40) % 31
>> SELECT make_date(year, easter_month, easter_day)
>>
>> or maybe even WITH like this:
>>
>> WITH
>>   year % 19 AS g ,
>>   year / 100 AS c,
>>   (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 AS h,
>>   h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)) AS i,
>>   year + year/4 + i + 2 - c + c/4) % 7 AS j,
>>   i - j AS p,
>>   3 + (p + 26)/30 AS easter_month,
>>   1 + (p + 27 + (p + 6)/40) % 31 AS easter_day
>> SELECT make_date(year, easter_month, easter_day)
>>
>
> I do not think this clause is necessary (because we have PLpgSQL or C),
> but other people can have different opinions (and it is true, so this
> feature can have some performance benefit - because it enhances the
> possibilities of inlined expressions and custom (own) extensions are
> prohibited in cloud environments (and will be) ).  Theoretically the
> implementation of this feature should not be hard, because these variables
> are very local only (the scope is just row), so this is just a game for
> parser and for expression's interpreter. But if you introduce this feature,
> then it is better to use syntax that is used by some other well known
> systems (Oracle or others).
>

The name for this feature can be "row scope variables" and yes, in OLAP
queries there are repeated expressions where this feature can be useful.

postgres=# explain verbose select  make_date(year, easter_month,
easter_day) from (select year,  3 + (p + 26)/30 AS easter_month, 1 + (p +
27 + (p + 6)/40) % 31 AS easter_day from (  select year, i - j AS p from
(select year, i, (year + year/4 + i + 2 - c + c/4) % 7 AS j from (select
year, c, h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)) AS i from
(select year, g, c, (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 AS h  from
(select year, year % 19 as g, year / 100 as c from
generate_series(2019,2020) g(year) offset 0) s1 offset 0) s2 offset 0) s3
offset 0) s4 offset 0) s5 offset 0) s6;
┌─┐
│ QUERY
PLAN  │
╞═╡
│ Subquery Scan on s6  (cost=0.00..0.35 rows=2 width=4)
  │
│   Output: make_date(s6.year, s6.easter_month, s6.easter_day)
   │
│   ->  Subquery Scan on s5  (cost=0.00..0.33 rows=2 width=12)
   │
│ Output: s5.year, (3 + ((s5.p + 26) / 30)), (1 + (((s5.p + 27) +
((s5.p + 6) / 40)) % 31))   │
│ ->  Subquery Scan on s4  (cost=0.00..0.26 rows=2 width=8)
  │
│   Output: s4.year, (s4.i - s4.j)
   │
│   ->  Subquery Scan on s3  (cost=0.00..0.24 rows=2 width=12)
   │
│ Output: s3.year, s3.i, ((s3.year + (s3.year / 4))
+ s3.i) + 2) - s3.c) + (s3.c / 4)) % 7)   │
│ ->  Subquery Scan on s2  (cost=0.00..0.18 rows=2
width=12)  │
│   Output: s2.year, s2.c, (s2.h - ((s2.h / 28) *
(1 - (((s2.h / 28) * (29 / (s2.h + 1))) * ((21 - s2.g) / 11)│
│   ->  Subquery Scan on s1  (cost=0.00..0.10
rows=2 width=16)│
│ Output: s1.year, s1.g, s1.c, (s1.c -
(s1.c / 4)) - (((8 * s1.c) + 13) / 25)) + (19 * s1.g)) + 15) % 30) │
│ ->  Function Scan on
pg_catalog.generate_series g  (cost=0.00..0.03 rows=2 width=12)
   │
│   Output: g.year, (g.year % 19),
(g.year / 100) │
│   Function Call:
generate_series(2019, 2020)
   │
└─┘
(15 rows)



Pavel


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Michael Banck
Am Mittwoch, den 06.01.2021, 19:07 +0100 schrieb Michael Banck:
> Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund:
> > It still requires running a binary locally on the DB server, no?  Which
> > means it'll not be an option on most cloud providers...
> 
> At least Azure and RDS seem to have data_checksums on anyway, I don't
> have a GCP test instance around handily right now to check.

Well I was curious: GCP SQL Postgres also has checksums enabled.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz





Re: set_config() documentation clarification

2021-01-06 Thread Pavel Stehule
út 5. 1. 2021 v 22:15 odesílatel Joel Jacobson  napsal:

> On Tue, Jan 5, 2021, at 21:34, Pavel Stehule wrote:
> > yes, it is supported. More the schema variables supports RESET to
> default on transaction end,
> > and supports access rights for usage in security definer functions.
>
> Nice.
>
> > Maybe - I don't know what "Statement variables" :). Other databases do
> not have similarly named features.
>
> I know, I made that name up just to make the connection,
> the name used by other databases is "LET clause",
> and in functional languages such as OCaml and Haskell,
> this concept is called "let expressions".
>
> > There are two concepts (these concepts can be mixed). Maybe - you can
> see there how non tabular objects can be
> > accessed in queries with.
> > ...
>
> Thank you for a detailed explanation, very useful.
>
> >> Also, do you know if Schema variables are part of the SQL standard?
> > ANSI SQL defines modules, and some variables can be declared in module
> scope. Modules are like our schemas with the
> > possibility to define private objects. But I don't know any
> implementation of this part of the standard in some widely used
> > database . It is like a mix of package concepts (Oracle) with schemas,
> because modules can hold private database objects
> > like tables or temporary tables. So my proposed schema variables are not
> part of SQL standard, because related features
> > depend on modules. Functionally it is similar +/-. Personally I don't
> like concepts of modules (or packages) too much. The
> > schemas are a good replacement for 90% and the current system of
> qualified names and search path, that is same for
> > tables and same for procedures, is very simple and good enough). So
> instead of introducing modules, I prefer enhanced
> > schemas about some features like private objects. But it is in the
> category "nice to have" rather than a necessary feature.
>
> This is encouraging to hear, then I will pray there might be hope for LET
> clauses I need,
> even though not being part of the SQL standard.
>
> In another attempt to sell the LET clause feature, imagine OCaml/Haskell
> *without* let expressions,
> where users would be advised to write functions in a different language
> like C,
> to do their complex computations reused at many places, and then return
> the result back to OCaml/Haskell.
> That wouldn't be a very nice user-experience to the OCaml/Haskell user.
>
> I really think a lot of real-life complex SQL code could be simplified a
> lot
> and written much more clear and concise with LET clauses.
>

I have no idea - all my life I use procedural languages, when this case is
not a problem.


> Since using "SET" as the command for Schema variables,
> maybe using SET for LET clause would make the idea less controversial:
>

The schema variables (my patch) introduced the LET statement, because SET
(SET keyword) is already used in Postgres for GUC setting and works with
GUC. But this fact doesn't block using LET as a new clause.


> SET
>   g = year % 19,
>   c = year / 100,
>   h = (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30,
>   i = h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)),
>   j = year + year/4 + i + 2 - c + c/4) % 7,
>   p = i - j,
>   easter_month = 3 + (p + 26)/30,
>   easter_day = 1 + (p + 27 + (p + 6)/40) % 31
> SELECT make_date(year, easter_month, easter_day)
>
> or maybe even WITH like this:
>
> WITH
>   year % 19 AS g ,
>   year / 100 AS c,
>   (c - c/4 - (8*c + 13)/25 + 19*g + 15) % 30 AS h,
>   h - (h/28)*(1 - (h/28)*(29/(h + 1))*((21 - g)/11)) AS i,
>   year + year/4 + i + 2 - c + c/4) % 7 AS j,
>   i - j AS p,
>   3 + (p + 26)/30 AS easter_month,
>   1 + (p + 27 + (p + 6)/40) % 31 AS easter_day
> SELECT make_date(year, easter_month, easter_day)
>

I do not think this clause is necessary (because we have PLpgSQL or C), but
other people can have different opinions (and it is true, so this feature
can have some performance benefit - because it enhances the possibilities
of inlined expressions and custom (own) extensions are prohibited in cloud
environments (and will be) ).  Theoretically the implementation of this
feature should not be hard, because these variables are very local only
(the scope is just row), so this is just a game for parser and for
expression's interpreter. But if you introduce this feature, then it is
better to use syntax that is used by some other well known systems (Oracle
or others).


> I will study SQL code in the wild on Github written by other users to see
> how many %
> that could benefit from this feature.
>

I am sure, so it can be very good task for learning PostgresSQL internals -
parser and executor, and it can be funny work (when I started with
Postgres, I had to modify same parts).

Regards

Pavel


> Maybe I'm wrong, but my gut feeling says this would be a really good thing,
> and just like like Schema variables, I didn't really know I needed them
> before I saw them.
>
> Best regards,
>
> Joel
>


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Michael Banck
Am Mittwoch, den 06.01.2021, 09:58 -0800 schrieb Andres Freund:
> It still requires running a binary locally on the DB server, no?  Which
> means it'll not be an option on most cloud providers...

At least Azure and RDS seem to have data_checksums on anyway, I don't
have a GCP test instance around handily right now to check.


Micael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Michael Banck
Hi,

On Wed, Jan 06, 2021 at 09:55:08AM -0800, Andres Freund wrote:
> On 2021-01-06 12:02:40 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote:
> > > > This looks much better from the WAL size perspective, there's now almost
> > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so
> > > > in a real-world example it might still be quite larger. Also, the vacuum
> > > > runtime is still 15x longer.
> > > 
> > > That's obviously an issue.
> > 
> > It'd certainly be nice to figure out a way to improve the VACUUM run but
> > I don't think the impact on the time to run VACUUM is really a good
> > reason to not move forward with changing the default.
> 
> Vacuum performance is one of *THE* major complaints about
> postgres. Making it run slower by a lot obviously exascerbates that
> problem significantly. I think it'd be prohibitively expensive if it
> were 1.5x, not to even speak of 15x.

To maybe clarify, the vacuum slowdown is just as large in my (somewhat
contrived as a worst-case scenario) tests when wal_log_hints is on and
not data_checksums, I just ommitted those numbers due to being basically
identical (or maybe a bit worse even):

|data_checksums=off, wal_log_hints=off:
|
|done in 10.24 s (vacuum 3.31 s, primary keys 6.92 s).
|done in 8.81 s (vacuum 2.72 s, primary keys 6.09 s).
|done in 8.35 s (vacuum 2.32 s, primary keys 6.03 s).
|
|data_checksums=off, wal_log_hints=on:
|
|1,5G   data1/pg_wal
|1,5G   data1/base
|2,5G   data1_archive/
|
|done in 87.89 s (vacuum 69.67 s, primary keys 18.23 s).
|done in 73.71 s (vacuum 60.19 s, primary keys 13.52 s).
|done in 75.12 s (vacuum 62.49 s, primary keys 12.62 s).
|
|data_checksums=on, wal_log_hints=off:
|
|done in 67.42 s (vacuum 54.57 s, primary keys 12.85 s).
|done in 65.03 s (vacuum 53.25 s, primary keys 11.78 s).
|done in 77.57 s (vacuum 62.64 s, primary keys 14.94 s).

Of course, wal_log_hints is not the default either and can be turned off
easily. You mostly lose the ability to run pg_rewind I think, are there
other use-cases for it?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Magnus Hagander
On Wed, Jan 6, 2021 at 6:58 PM Andres Freund  wrote:
>
> Hi,
>
> On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote:
> > The other argument is that admins can cheaply and quickly turn off
> > checksums if they don't want them.
> >
> > The same cannot be said for turning them *on* again, that's a very
> > slow offline operation at this time.
> >
> > Turning off checksums doesn't take noticeably more time than say
> > changing the shared_buffers from the default, which is also almost
> > guaranteed to be wrong for most installations.
>
> It still requires running a binary locally on the DB server, no?  Which

It does.

So does changing shared_buffers -- for example you need to run
"systemctl" if you're on systemd, or just pg_ctl if you're using
unpackaged postres.


> means it'll not be an option on most cloud providers...

I really don't see why.

They've implemented the ability to restart postgres. Surely they can
implement the ability to run a single command in between.

Or if that's too complicated, they are more than capable of passing a
parameter to initdb to change what the default is on their platform.
They already do so for other things (such as not using trust or peer
auth by default, or by actually not having a superuser setc).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote:
> > The other argument is that admins can cheaply and quickly turn off
> > checksums if they don't want them.
> > 
> > The same cannot be said for turning them *on* again, that's a very
> > slow offline operation at this time.
> > 
> > Turning off checksums doesn't take noticeably more time than say
> > changing the shared_buffers from the default, which is also almost
> > guaranteed to be wrong for most installations.
> 
> It still requires running a binary locally on the DB server, no?  Which
> means it'll not be an option on most cloud providers...

... unless they choose to make it an option, which is entirely up to
them and certainly well within what they're capable of doing.  I'd also
mention that, at least according to some cloud providers I've talked to,
they specifically wouldn't support PG until data checksums were
available, making me not really feel like having them enabled by default
would be such an issue (not to mention that, clearly, cloud providers
could choose to change the default for their deployments if they wished
to).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2021-01-06 12:02:40 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2021-01-04 19:11:43 +0100, Michael Banck wrote:
> > > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:
> > > > > I agree with this, but I'd also like to propose, again, as has been
> > > > > discussed a few times, making it the default too.
> > > 
> > > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
> > > logging overhead, the copy we do via PageSetChecksumCopy() shows up
> > > quite significantly in profiles here. Together with the checksums
> > > computation that's *halfing* write throughput on fast drives in my aio
> > > branch.
> > 
> > Our defaults are not going to win any performance trophies and so I
> > don't see the value in stressing over it here.
> 
> Meh^3. There's a difference between defaults that are about resource
> usage (e.g. shared_buffers) and defaults that aren't.

fsync isn't about resource usage.

> > > > This looks much better from the WAL size perspective, there's now almost
> > > > no additional WAL. However, that is because pgbench doesn't do TOAST, so
> > > > in a real-world example it might still be quite larger. Also, the vacuum
> > > > runtime is still 15x longer.
> > > 
> > > That's obviously an issue.
> > 
> > It'd certainly be nice to figure out a way to improve the VACUUM run but
> > I don't think the impact on the time to run VACUUM is really a good
> > reason to not move forward with changing the default.
> 
> Vacuum performance is one of *THE* major complaints about
> postgres. Making it run slower by a lot obviously exascerbates that
> problem significantly. I think it'd be prohibitively expensive if it
> were 1.5x, not to even speak of 15x.

We already make vacuum, when run out of autovacuum, relatively slow,
quite intentionally.  If someone's having trouble with vacuum run times
they're going to be adjusting the configuration anyway.

> > imv, enabling page checksums is akin to having fsync enabled by default.
> > Does it impact performance?  Yes, surely quite a lot, but it's also the
> > safe and sane choice when it comes to defaults.
> 
> Oh for crying out loud.

Not sure what you're hoping to gain from such comments, but it doesn't
do anything to change my opinion.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Andres Freund
Hi,

On 2021-01-06 18:27:48 +0100, Magnus Hagander wrote:
> The other argument is that admins can cheaply and quickly turn off
> checksums if they don't want them.
> 
> The same cannot be said for turning them *on* again, that's a very
> slow offline operation at this time.
> 
> Turning off checksums doesn't take noticeably more time than say
> changing the shared_buffers from the default, which is also almost
> guaranteed to be wrong for most installations.

It still requires running a binary locally on the DB server, no?  Which
means it'll not be an option on most cloud providers...

Greetings,

Andres Freund




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Andres Freund
Hi,

On 2021-01-06 12:02:40 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2021-01-04 19:11:43 +0100, Michael Banck wrote:
> > > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:
> > > > I agree with this, but I'd also like to propose, again, as has been
> > > > discussed a few times, making it the default too.
> > 
> > FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
> > logging overhead, the copy we do via PageSetChecksumCopy() shows up
> > quite significantly in profiles here. Together with the checksums
> > computation that's *halfing* write throughput on fast drives in my aio
> > branch.
> 
> Our defaults are not going to win any performance trophies and so I
> don't see the value in stressing over it here.

Meh^3. There's a difference between defaults that are about resource
usage (e.g. shared_buffers) and defaults that aren't.


> > > This looks much better from the WAL size perspective, there's now almost
> > > no additional WAL. However, that is because pgbench doesn't do TOAST, so
> > > in a real-world example it might still be quite larger. Also, the vacuum
> > > runtime is still 15x longer.
> > 
> > That's obviously an issue.
> 
> It'd certainly be nice to figure out a way to improve the VACUUM run but
> I don't think the impact on the time to run VACUUM is really a good
> reason to not move forward with changing the default.

Vacuum performance is one of *THE* major complaints about
postgres. Making it run slower by a lot obviously exascerbates that
problem significantly. I think it'd be prohibitively expensive if it
were 1.5x, not to even speak of 15x.



> imv, enabling page checksums is akin to having fsync enabled by default.
> Does it impact performance?  Yes, surely quite a lot, but it's also the
> safe and sane choice when it comes to defaults.

Oh for crying out loud.


Greetings,

Andres Freund




Re: Enhance traceability of wal_level changes for backup management

2021-01-06 Thread Stephen Frost
Greetings,

* osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> You said 
> > The use case I imagined is that the user temporarily
> > changes wal_level to 'none' from 'replica' or 'logical' to speed up loading 
> > and
> > changes back to the normal. In this case, the backups taken before the
> > wal_level change cannot be used to restore the database to the point after 
> > the
> > wal_level change. So I think backup management tools would want to
> > recognize the time or LSN when/where wal_level is changed to ‘none’ in order
> > to do some actions such as invalidating older backups, re-calculating backup
> > redundancy etc.
> > Actually the same is true when the user changes to ‘minimal’. The tools 
> > would
> > need to recognize the time or LSN in this case too. Since temporarily 
> > changing
> > wal_level has been an uncommon use case some tools perhaps are not aware
> > of that yet. But since introducing wal_level = ’none’ could make the change
> > common, I think we need to provide a way for the tools.

I continue to be against the idea of introducing another wal_level.  If
there's additional things we can do to reduce WAL traffic while we
continue to use it to keep the system in a consistent state then we
should implement those for the 'minimal' case and be done with it.
Adding another wal_level is just going to be confusing and increase
complexity, and the chances that someone will end up in a bad state.

> I wondered, couldn't backup management tools utilize the information
> in the backup that is needed to be made when wal_level is changed to "none" 
> for example ?

What information is that, exactly?  If there's a way to detect that the
wal_level has been flipped to 'minimal' and then back to 'replica',
other than scanning the WAL, we'd certainly like to hear of it, so we
can implement logic in pgbackrest to detect that happening.  I'll admit
that I've not gone hunting for such of late, but I don't recall seeing
anything that would change that either.

The idea proposed above about having the LSN and time recorded when a
wal_level change to minimal happens, presumably in pg_control, seems
like it could work as a way to allow external tools to more easily
figure out if the wal_level's been changed to minimal.  Perhaps there's
a reason to track changes between replica and logical but I can't think
of any offhand and backup tools would still need to know if the wal
level was set to *minimal* independently of changes between replica and
logical.

Then again, once we add support for scanning the WAL to pgbackrest,
we'll almost certainly track it that way since that'll work for older
and released versions of PG, so I'm not really sure it's worth it to add
this to pg_control unless there's other reasons to.

> As I said before, existing backup management tools support
> only wal_level=replica or logical at present. And, if they would wish to 
> alter the
> status quo and want to cover the changes over wal_levels, I felt it's natural 
> that
> they support feature like taking a full backup, trigged by the wal_level 
> changes (or server stop).

Sure, but there needs to be a way to actually do that..

> This is because taking a backup is a must for wal_level=none,
> as I described in the patch of wal_level=none.
> For example, they could prepare an easy way to
> take an offline physical backup when the server stops for changing the 
> wal_level.
> (Here, they can support the change to minimal, too.)

pgbackrest does support offline physical backups and it's pretty easy
(just pass in --no-online).  That doesn't really help with the issue of
detecting a wal_level change though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Add Information during standby recovery conflicts

2021-01-06 Thread Fujii Masao



On 2020/12/15 0:20, Fujii Masao wrote:



On 2020/12/14 21:31, Fujii Masao wrote:



On 2020/12/05 12:38, Masahiko Sawada wrote:

On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand  wrote:


Hi,

On 12/4/20 2:21 AM, Fujii Masao wrote:


On 2020/12/04 9:28, Masahiko Sawada wrote:

On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
 wrote:




On 2020/12/01 17:29, Drouvot, Bertrand wrote:

Hi,

On 12/1/20 12:35 AM, Masahiko Sawada wrote:

CAUTION: This email originated from outside of the organization.
Do not click links or open attachments unless you can confirm the
sender and know the content is safe.



On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
 wrote:

On 2020-Dec-01, Fujii Masao wrote:


+ if (proc)
+ {
+ if (nprocs == 0)
+ appendStringInfo(, "%d", proc->pid);
+ else
+ appendStringInfo(, ", %d", proc->pid);
+
+ nprocs++;

What happens if all the backends in wait_list have gone? In
other words,
how should we handle the case where nprocs == 0 (i.e., nprocs
has not been
incrmented at all)? This would very rarely happen, but can happen.
In this case, since buf.data is empty, at least there seems no
need to log
the list of conflicting processes in detail message.

Yes, I noticed this too; this can be simplified by changing the
condition in the ereport() call to be "nprocs > 0" (rather than
wait_list being null), otherwise not print the errdetail.  (You
could
test buf.data or buf.len instead, but that seems uglier to me.)

+1

Maybe we can also improve the comment of this function from:

+ * This function also reports the details about the conflicting
+ * process ids if *wait_list is not NULL.

to " This function also reports the details about the conflicting
process ids if exist" or something.


Thank you all for the review/remarks.

They have been addressed in the new attached patch version.


Thanks for updating the patch! I read through the patch again
and applied the following chages to it. Attached is the updated
version of the patch. Could you review this version? If there is
no issue in it, I'm thinking to commit this version.


Thank you for updating the patch! I have one question.



+   timeouts[cnt].id = STANDBY_TIMEOUT;
+   timeouts[cnt].type = TMPARAM_AFTER;
+   timeouts[cnt].delay_ms = DeadlockTimeout;

Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
I changed the code that way.


As the comment of ResolveRecoveryConflictWithLock() says the
following, a deadlock is detected by the ordinary backend process:

   * Deadlocks involving the Startup process and an ordinary backend
proces
   * will be detected by the deadlock detector within the ordinary
backend.

If we use STANDBY_DEADLOCK_TIMEOUT,
SendRecoveryConflictWithBufferPin() will be called after
DeadlockTimeout passed, but I think it's not necessary for the startup
process in this case.


Thanks for pointing this! You are right.



If we want to just wake up the startup process
maybe we can use STANDBY_TIMEOUT here?



Thanks for the patch updates! Except what we are still discussing below,
it looks good to me.


When STANDBY_TIMEOUT happens, a request to release conflicting buffer
pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?


Agree



Or, first of all, we don't need to enable the deadlock timer at all?
Since what we'd like to do is to wake up after deadlock_timeout
passes, we can do that by changing ProcWaitForSignal() so that it can
accept the timeout and giving the deadlock_timeout to it. If we do
this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
ResolveRecoveryConflictWithLock(). Thought?


Where do we enable deadlock timeout in hot standby case? You meant to
enable it in ProcWaitForSignal() or where we set a timer for not hot
standby case, in ProcSleep()?


No, what I tried to say is to change ResolveRecoveryConflictWithLock() so that 
it does

1. calculate the "minimum" timeout from deadlock_timeout and 
max_standby_xxx_delay
2. give the calculated timeout value to ProcWaitForSignal()
3. wait for signal and timeout on ProcWaitForSignal()

Since ProcWaitForSignal() calls WaitLatch(), seems it's not so difficult to 
make ProcWaitForSignal() handle the timeout. If we do this, I was thinking that 
we can get rid of enable_timeouts() from ResolveRecoveryConflictWithLock().






Why not simply use (again) the STANDBY_LOCK_TIMEOUT one? (as it triggers
a call to StandbyLockTimeoutHandler() which does nothing, except waking
up. That's what we want, right?)


Right, what I wanted to mean is STANDBY_LOCK_TIMEOUT. The startup
process can wake up and do nothing. Thank you for pointing out.


Okay, understood! Firstly I was thinking that enabling the same type (i.e., 
STANDBY_LOCK_TIMEOUT) of lock twice doesn't work properly, but as far as I read 
the code, it works. In that case, only the shorter timeout would be 

Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Magnus Hagander
On Wed, Jan 6, 2021 at 8:31 AM Michael Banck  wrote:
>
> Hi,
>
> Am Mittwoch, den 06.01.2021, 10:52 +0900 schrieb Michael Paquier:
> > On Mon, Jan 04, 2021 at 07:11:43PM +0100, Michael Banck wrote:
> > > So maybe we should switch on wal_compression if we enable data checksums
> > > by default.
> >
> > I don't agree with this assumption.  In some CPU-bounded workloads, I
> > have seen that wal_compression = on leads to performance degradation
> > with or without checksums enabled.
>
> I meant just flipping the default, admins could still turn off
> wal_compression if they think it'd help their performance. But it might
> be tricky to implement, not sure.

The other argument is that admins can cheaply and quickly turn off
checksums if they don't want them.

The same cannot be said for turning them *on* again, that's a very
slow offline operation at this time.

Turning off checksums doesn't take noticeably more time than say
changing the shared_buffers from the default, which is also almost
guaranteed to be wrong for most installations.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Phrase search vs. multi-lexeme tokens

2021-01-06 Thread Tom Lane
Alexander Korotkov  writes:
> # select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class 
> foo"');
>  ?column?
> --
>  f

Yeah, surely this is wrong.

> # select to_tsquery('pg_class <-> foo');
>   to_tsquery
> --
>  ( 'pg' & 'class' ) <-> 'foo'

> I think if a user writes 'pg_class <-> foo', then it's expected to
> match 'pg_class foo' independently on which lexemes 'pg_class' is
> split into.

Indeed.  It seems to me that this:

regression=# select to_tsquery('pg_class');
   to_tsquery   

 'pg' & 'class'
(1 row)

is wrong all by itself.  Now that we have phrase search, a much
saner translation would be "'pg' <-> 'class'".  If we fixed that
then it seems like the more complex case would just work.

I read your patch over quickly and it seems like a reasonable
approach (but sadly underdocumented).  Can we extend the idea
to fix the to_tsquery case?

regards, tom lane




Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Jan  6, 2021 at 12:02:40PM -0500, Stephen Frost wrote:
> > > It unfortunately also hurts other workloads. If we moved towards a saner
> > > compression algorithm that'd perhaps not be an issue anymore...
> > 
> > I agree that improving compression performance would be good but I don't
> > see that as relevant to the question of what our defaults should be.
> > 
> > imv, enabling page checksums is akin to having fsync enabled by default.
> > Does it impact performance?  Yes, surely quite a lot, but it's also the
> > safe and sane choice when it comes to defaults.
> 
> Well, you know fsyncs are required to recover from an OS crash, which is
> more likely than detecting data corruption.

Yes, I do know that.  That doesn't change my feeling that we should have
checksums enabled by default.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Bruce Momjian
On Wed, Jan  6, 2021 at 12:02:40PM -0500, Stephen Frost wrote:
> > It unfortunately also hurts other workloads. If we moved towards a saner
> > compression algorithm that'd perhaps not be an issue anymore...
> 
> I agree that improving compression performance would be good but I don't
> see that as relevant to the question of what our defaults should be.
> 
> imv, enabling page checksums is akin to having fsync enabled by default.
> Does it impact performance?  Yes, surely quite a lot, but it's also the
> safe and sane choice when it comes to defaults.

Well, you know fsyncs are required to recover from an OS crash, which is
more likely than detecting data corruption.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)

2021-01-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2021-01-04 19:11:43 +0100, Michael Banck wrote:
> > Am Samstag, den 02.01.2021, 10:47 -0500 schrieb Stephen Frost:
> > > * Michael Paquier (mich...@paquier.xyz) wrote:
> > > > On Fri, Jan 01, 2021 at 08:34:34PM +0100, Michael Banck wrote:
> > > > > I think enough people use data checksums these days that it warrants 
> > > > > to
> > > > > be moved into the "normal part", like in the attached.
> > > > 
> > > > +1.  Let's see first what others think about this change.
> > > 
> > > I agree with this, but I'd also like to propose, again, as has been
> > > discussed a few times, making it the default too.
> 
> FWIW, I am quite doubtful we're there performance-wise. Besides the WAL
> logging overhead, the copy we do via PageSetChecksumCopy() shows up
> quite significantly in profiles here. Together with the checksums
> computation that's *halfing* write throughput on fast drives in my aio
> branch.

Our defaults are not going to win any performance trophies and so I
don't see the value in stressing over it here.

> > This looks much better from the WAL size perspective, there's now almost
> > no additional WAL. However, that is because pgbench doesn't do TOAST, so
> > in a real-world example it might still be quite larger. Also, the vacuum
> > runtime is still 15x longer.
> 
> That's obviously an issue.

It'd certainly be nice to figure out a way to improve the VACUUM run but
I don't think the impact on the time to run VACUUM is really a good
reason to not move forward with changing the default.

> > So maybe we should switch on wal_compression if we enable data checksums
> > by default.

That does seem like a good idea to me, +1 to also changing that.

> It unfortunately also hurts other workloads. If we moved towards a saner
> compression algorithm that'd perhaps not be an issue anymore...

I agree that improving compression performance would be good but I don't
see that as relevant to the question of what our defaults should be.

imv, enabling page checksums is akin to having fsync enabled by default.
Does it impact performance?  Yes, surely quite a lot, but it's also the
safe and sane choice when it comes to defaults.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PoC/WIP: Extended statistics on expressions

2021-01-06 Thread Dean Rasheed
Looking over the statscmds.c changes, there are a few XXX's and
FIXME's that need resolving, and I had a couple of other minor
comments:

+   /*
+* An expression using mutable functions is probably wrong,
+* since if you aren't going to get the same result for the
+* same data every time, it's not clear what the index entries
+* mean at all.
+*/
+   if (CheckMutability((Expr *) expr))
+   ereport(ERROR,

That comment is presumably copied from the index code, so needs updating.


+   /*
+* Disallow data types without a less-than operator
+*
+* XXX Maybe allow this, but only for EXPRESSIONS stats and
+* prevent building e.g. MCV etc.
+*/
+   atttype = exprType(expr);
+   type = lookup_type_cache(atttype, TYPECACHE_LT_OPR);
+   if (type->lt_opr == InvalidOid)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("expression cannot be used in
statistics because its type %s has no default btree operator class",
+   format_type_be(atttype;

As the comment suggests, it's probably worth skipping this check if
numcols is 1 so that single-column stats can be built for more types
of expressions. (I'm assuming that it's basically no more effort to
make that work, so I think it falls into the might-as-well-do-it
category.)


+   /*
+* Parse the statistics kinds.  Firstly, check that this is not the
+* variant building statistics for a single expression, in which case
+* we don't allow specifying any statistis kinds.  The simple variant
+* only has one expression, and does not allow statistics kinds.
+*/
+   if ((list_length(stmt->exprs) == 1) && (list_length(stxexprs) == 1))
+   {

Typo: "statistis"
Nit-picking, this test could just be:

+   if ((numcols == 1) && (list_length(stxexprs) == 1))

which IMO is a little more readable, and matches a similar test a
little further down.


+   /*
+* If there are no simply-referenced columns, give the statistics an
+* auto dependency on the whole table.  In most cases, this will
+* be redundant, but it might not be if the statistics expressions
+* contain no Vars (which might seem strange but possible).
+*
+* XXX This is copied from index_create, not sure if it's applicable
+* to extended statistics too.
+*/

Seems right to me.


+   /*
+* FIXME use 'expr' for expressions, which have empty column names.
+* For indexes this is handled in ChooseIndexColumnNames, but we
+* have no such function for stats.
+*/
+   if (!name)
+   name = "expr";

In theory, this function could be made to duplicate the logic used for
indexes, creating names like "expr1", "expr2", etc. To be honest
though, I don't think it's worth the effort. The code for indexes
isn't really bulletproof anyway -- for example there might be a column
called "expr" that is or isn't included in the index, which would make
the generated name ambiguous. And in any case, a name like
"tbl_cola_expr_colb_expr1_colc_stat" isn't really any more useful than
"tbl_cola_expr_colb_expr_colc_stat". So I'd be tempted to leave that
code as it is.


+
+/*
+ * CheckMutability
+ * Test whether given expression is mutable
+ *
+ * FIXME copied from indexcmds.c, maybe use some shared function?
+ */
+static bool
+CheckMutability(Expr *expr)
+{

As the comment says, it's quite messy duplicating this code, but I'm
wondering whether it would be OK to just skip this check entirely. I
think someone else suggested that elsewhere, and I think it might not
be a bad idea.

For indexes, it could easily lead to wrong query results, but for
stats the most likely problem is that the stats would get out of date
(which they tend to do all by themselves anyway) and need rebuilding.

If you ignore intentionally crazy examples (which are still possible
even with this check), then there are probably many legitimate cases
where someone might want to use non-immutable functions in stats, and
this check just forces them to create an immutable wrapper function.

Regards,
Dean




RE: [Patch] Optimize dropping of relation buffers using dlist

2021-01-06 Thread Tang, Haiying
Hi Kirk,

>And if you want to test, I have already indicated the detailed steps including 
>the scripts I used. Have fun testing!

Thank you for your sharing of test steps and scripts. I'd like take a look at 
them and redo some of the tests using my machine. I'll send my test reults in a 
separate email after this.

Regards,
Tang




Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-06 Thread Heikki Linnakangas

On 25/12/2020 02:57, Michael Paquier wrote:

On Mon, Dec 21, 2020 at 04:28:26PM -0500, Robert Haas wrote:

TBH, I think there's no point in return an error here at all, because
it's totally non-specific. You have no idea what failed, just that
something failed. Blech. If we want to check that ctx is non-NULL, we
should do that with an Assert(). Complicating the code with error
checks that have to be added in multiple places that are far removed
from where the actual problem was detected stinks.


You could technically do that, but only for the backend at the cost of
painting the code of src/common/ with more #ifdef FRONTEND.  Even if
we do that, enforcing an error in the backend could be a problem when
it comes to some code paths.


Yeah, you would still need to remember to check for the error in 
frontend code. Maybe it would still be a good idea, not sure. It would 
be a nice backstop, if you forget to check for the error.


I had a quick look at the callers:

contrib/pgcrypto/internal-sha2.c and 
src/backend/utils/adt/cryptohashfuncs.c: the call to 
pg_cryptohash_create() is missing check for NULL result. With your 
latest patch, that's OK because the subsequent pg_cryptohash_update() 
calls will fail if passed a NULL context. But seems sloppy.


contrib/pgcrypto/internal.c: all the calls to pg_cryptohash_* functions 
are missing checks for error return codes.


contrib/uuid-ossp/uuid-ossp.c: uses pg_cryptohash for MD5, but borrows 
the built-in implementation of SHA1 on some platforms. Should we add 
support for SHA1 in pg_cryptohash and use that for consistency?


src/backend/libpq/auth-scram.c: SHA256 is used in the mock 
authentication. If the pg_cryptohash functions fail, we throw a distinct 
error "could not encode salt" that reveals that it was a mock 
authentication. I don't think this is a big deal in practice, it would 
be hard for an attacker to induce the SHA256 computation to fail, and 
there are probably better ways to distinguish mock authentication from 
real, like timing attacks. But still.


src/include/common/checksum_helper.h: in pg_checksum_raw_context, do we 
still need separate fields for the different sha variants.


- Heikki




Re: Moving other hex functions to /common

2021-01-06 Thread Bruce Momjian
On Wed, Jan  6, 2021 at 01:10:28PM +0900, Michael Paquier wrote:
> > It was very confusing, and this attached patch fixes all of that.  I
> > also added the pg_ prefix you suggrested.  If we want to add dstlen to
> > all the functions, we have to do it for all types --- not sure it is
> > worth it, now that things are much clearer.
> 
> Overflow protection is very important in my opinion here once we
> expose more those functions.  Not touching ECPG at all and not making
> this refactoring pluggable into shared libs is fine by me at the end
> because we don't have a case for libpq yet, but I object to the lack
> of protection against overflows.

Fine.  Do you want to add the overflow to the patch I posted, for all
encoding types?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-06 Thread Heikki Linnakangas

On 06/01/2021 13:42, Michael Paquier wrote:

On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote:

At the same time, I have taken care of your comment from upthread to
return a failure if the caller passes NULL for the context, and
adjusted some comments.  What do you think of the attached?


I have looked again at this thread with a fresher mind and I did not
see a problem with the previous patch, except some indentation
issues.  So if there are no objections, I'd like to commit the
attached.


Looks fine to me.

- Heikki




[PATCH] Partial foreign key updates in referential integrity triggers

2021-01-06 Thread Paul Martinez
Hey, hackers!

I've created a patch to better support referential integrity constraints when
using composite primary and foreign keys. This patch allows creating a foreign
key using the syntax:

  FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)

which means that only the fk_id column will be set to NULL when the referenced
row is deleted, rather than both the tenant_id and fk_id columns.

In multi-tenant applications, it is common to denormalize a "tenant_id" column
across every table, and use composite primary keys of the form (tenant_id, id)
and composite foreign keys of the form (tenant_id, fk_id), reusing the
tenant_id column in both the primary and foreign key.

This is often done initially for performance reasons, but has the added benefit
of making it impossible for data from one tenant to reference data from another
tenant, also making this a good decision from a security perspective.

Unfortunately, one downside of using composite foreign keys in such a matter
is that commonly used referential actions, such as ON DELETE SET NULL, no
longer work, because Postgres tries to set all of the referencing columns to
NULL, including the columns that overlap with the primary key:


CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
  tenant_id int REFERENCES tenants ON DELETE CASCADE,
  id serial,
  PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
);

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, null).


This can be resolved by manually creating triggers on the referenced table, but
this is clunky and adds a significant amount of extra work when adding (or
removing!) foreign keys. Users shouldn't have to compromise on maintenance
overhead when using composite foreign keys.

I have implemented a simple extension to the syntax for foreign keys that
makes it just as easy to support referential integrity constraints for
composite foreign keys as it is for single column foreign keys. The SET NULL
and SET DEFAULT referential actions can now be optionally followed by a column
list:

key_action:
  | NO ACTION
  | RESTRICT
  | CASCADE
  | SET NULL [ (column_name [, ...] ) ]
  | SET DEFAULT [ (column_name [, ...] ) ]

When a column list is provided, only the specified columns, which must be a
subset of the referencing columns, will be updated in the associated trigger.

Note that use of SET NULL (col1, ...) on a composite foreign key with MATCH
FULL specified will still raise an error. In such a scenario we could raise an
error when the user tries to define the foreign key, but we don't raise a
similar error when a user tries to use SET NULL on a non-nullable column, so I
don't think this is critical. (I haven't added this check in my patch.) While
this feature would mostly be used with the default MATCH SIMPLE, I could
imagine using SET DEFAULT (col, ...) even for MATCH FULL foreign keys.

To store this additional data, I've added two columns to the pg_constraint
catalog entry:

confupdsetcols int2[]
confdelsetcols int2[]

These columns store the attribute numbers of the columns to update in the
ON UPDATE and ON DELETE triggers respectively. If the arrays are empty, then
all of the referencing columns should be updated.


I previously proposed this feature about a year ago [1], but I don't feel like
the arguments against it were very strong. Wanting to get more familiar with the
Postgres codebase I decided to implement the feature over this holiday break,
and I've gotten everything working and put together a complete patch including
tests and updates to documentation. Hopefully if people find it useful it can
make its way into the next commitfest!

Visual diff:
https://github.com/postgres/postgres/compare/master...PaulJuliusMartinez:on-upd-del-set-cols

Here's a rough outline of the changes:

src/backend/parser/gram.y | 122
src/include/nodes/parsenodes.h|   3
src/backend/nodes/copyfuncs.c |   2
src/backend/nodes/equalfuncs.c|   2
src/backend/nodes/outfuncs.c  |  47
- Modify grammar to add opt_column_list after SET NULL and SET DEFAULT
- Add fk_{upd,del}_set_cols fields to Constraint struct
- Add proper node support, as well as outfuncs support for AlterTableStmt,
  which I used while debugging

src/include/catalog/pg_constraint.h   |  20
src/backend/catalog/pg_constraint.c   |  80
src/include/catalog/catversion.h  |   2
- Add confupdsetcols and confdelsetcols to pg_constraint catalog entry

src/backend/commands/tablecmds.c  | 142
- Pass along data from parsed Constraint node to 

RE: [Patch] Optimize dropping of relation buffers using dlist

2021-01-06 Thread k.jami...@fujitsu.com
On Wed, January 6, 2021 7:04 PM (JST), I wrote:
> I will resume the test similar to Tang, because she also executed the original
> failover test which I have been doing before.
> To avoid confusion and to check if the results from mine and Tang are
> consistent, I also did the recovery/failover test for VACUUM on single 
> relation,
> which I will send in a separate email after this.

A. Test to find the right THRESHOLD

So below are the procedures and results of the VACUUM recovery performance
test on single relation.
I followed the advice below and applied the supplementary patch on top of V39:
Test-for-threshold.patch
This will ensure that we'll always enter the optimized path.
We're gonna use the threshold then as the relation size.

> >One idea could be to remove "nBlocksToInvalidate < 
> >BUF_DROP_FULL_SCAN_THRESHOLD" part of check "if (cached && 
> >nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)" so that it 
> >always use optimized path for the tests. Then use the relation size 
> >as NBuffers/128, NBuffers/256, NBuffers/512 for different values of 
> >shared buffers as 128MB, 1GB, 20GB, 100GB.

Each relation size is NBuffers/XXX, so I used the attached "rel.sh" script
to test from NBuffers/512 until NBuffers/8 relation size per shared_buffers.
I did not go further beyond 8 because it took too much time, and I could
already observe significant results until that.

[Vacuum Recovery Performance on Single Relation]
1. Setup synchronous streaming replication. I used the configuration
  written at the bottom of this email.
2. [Primary] Create 1 table. (rel.sh create)
3. [Primary] Insert data of NBuffers/XXX size. Make sure to use the correct
   size for the set shared_buffers by commenting out the right size in "insert"
   of rel.sh script. (rel.sh insert)
4. [Primary] Delete table. (rel.sh delete)
5. [Standby] Optional: To double-check that DELETE is reflected on standby.
   SELECT count(*) FROM tableXXX;
  Make sure it returns 0.
6. [Standby] Pause WAL replay. (rel.sh pause)
  (This script will execute SELECT pg_wal_replay_pause(); .)
7. [Primary] VACUUM the single relation. (rel.sh vacuum)
8. [Primary] After the vacuum finishes, stop the server. (rel.sh stop)
   (The script will execute pg_ctl stop -D $PGDATA -w -mi)
9. [Standby] Resume WAL replay and promote the standby.
  (rel.sh resume)
  It basically prints a timestamp when resuming WAL replay,
  and prints another timestamp when the promotion is done.
  Compute the time difference.

[Results for VACUUM on single relation]
Average of 5 runs.

1. % REGRESSION
% Regression: (patched - master)/master

| rel_size | 128MB  | 1GB| 20GB   | 100GB| 
|--||||--| 
| NB/512   | 0.000% | 0.000% | 0.000% | -32.680% | 
| NB/256   | 0.000% | 0.000% | 0.000% | 0.000%   | 
| NB/128   | 0.000% | 0.000% | 0.000% | -16.502% | 
| NB/64| 0.000% | 0.000% | 0.000% | -9.841%  | 
| NB/32| 0.000% | 0.000% | 0.000% | -6.219%  | 
| NB/16| 0.000% | 0.000% | 0.000% | 3.323%   | 
| NB/8 | 0.000% | 0.000% | 0.000% | 8.178%   |

For 100GB shared_buffers, we can observe regression
beyond NBuffers/32. So with this, we can conclude
that NBuffers/32 is the right threshold.
For NBuffers/16 and beyond, the patched performs
worse than master. In other words, the cost of for finding
to be invalidated buffers gets higher in the optimized path
than the traditional path.

So in attached V39 patches, I have updated the threshold
BUF_DROP_FULL_SCAN_THRESHOLD to NBuffers/32.

2. [PATCHED]
Units: Seconds

| rel_size | 128MB | 1GB   | 20GB  | 100GB | 
|--|---|---|---|---| 
| NB/512   | 0.106 | 0.106 | 0.106 | 0.206 | 
| NB/256   | 0.106 | 0.106 | 0.106 | 0.306 | 
| NB/128   | 0.106 | 0.106 | 0.206 | 0.506 | 
| NB/64| 0.106 | 0.106 | 0.306 | 0.907 | 
| NB/32| 0.106 | 0.106 | 0.406 | 1.508 | 
| NB/16| 0.106 | 0.106 | 0.706 | 3.109 | 
| NB/8 | 0.106 | 0.106 | 1.307 | 6.614 |

3. MASTER
Units: Seconds

| rel_size | 128MB | 1GB   | 20GB  | 100GB | 
|--|---|---|---|---| 
| NB/512   | 0.106 | 0.106 | 0.106 | 0.306 | 
| NB/256   | 0.106 | 0.106 | 0.106 | 0.306 | 
| NB/128   | 0.106 | 0.106 | 0.206 | 0.606 | 
| NB/64| 0.106 | 0.106 | 0.306 | 1.006 | 
| NB/32| 0.106 | 0.106 | 0.406 | 1.608 | 
| NB/16| 0.106 | 0.106 | 0.706 | 3.009 | 
| NB/8 | 0.106 | 0.106 | 1.307 | 6.114 |

I used the following configurations:
[postgesql.conf]
shared_buffers = 100GB #20GB,1GB,128MB
autovacuum = off
full_page_writes = off
checkpoint_timeout = 30min
max_locks_per_transaction = 1
max_wal_size = 20GB

# For streaming replication from primary. Don't uncomment on Standby.
synchronous_commit = remote_write
synchronous_standby_names = 'walreceiver'

# For Standby. Don't uncomment on Primary.
# hot_standby = on
#primary_conninfo = 'host=... user=... port=... application_name=walreceiver'

--
B. Regression Test using the NBuffers/32 Threshold (V39 Patches)

For this 

Re: New Table Access Methods for Multi and Single Inserts

2021-01-06 Thread Bharath Rupireddy
On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming  wrote:
> The main reason for me for wanting a single API is that I would like the
> decision of using single or multi inserts to move to inside the tableam.
> For e.g. a heap insert we might want to put the threshold at e.g. 100
> rows so that the overhead of buffering the tuples is actually
> compensated. For other tableam this logic might also be quite different,
> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
> whether or not multi inserts should be used. Because otherwise the thing
> we'll get is that there will be tableams that will ignore this flag and
> do their own thing anyway. I'd rather have an API that gives all
> necessary information to the tableam and then make the tableam do "the
> right thing".
>
> Another reason I'm suggesting this API is that I would expect that the
> begin is called in a different place in the code for the (multiple)
> inserts than the actual insert statement.
> To me conceptually the begin and end are like e.g. the executor begin
> and end: you prepare the inserts with the knowledge you have at that
> point. I assumed (wrongly?) that during the start of the statement one
> knows best how many rows are coming; and then the actual insertion of
> the row doesn't have to deal anymore with multi/single inserts, choosing
> when to buffer or not, because that information has already been given
> during the initial phase. One of the reasons this is appealing to me is
> that e.g. in [1] there was discussion on when to switch to a multi
> insert state, and imo this should be up to the tableam.

Agree that whether to go with the multi or single inserts should be
completely left to tableam implementation, we, as callers of those API
just need to inform whether we expect single or multiple rows, and it
should be left to tableam implementation whether to actually go with
buffering or single inserts. ISTM that it's an elegant way of making
the API generic and abstracting everything from the callers. What I
wonder is how can we know in advance the expected row count that we
need to pass in to heap_insert_begin()? IIUC, we can not estimate the
upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
other insert queries?  Of course, we can look at the planner's
estimated row count for the selects in COPY, Insert Into Select or
Refresh Mat View after the planning, but to me that's not something we
can depend on and pass in the row count to the insert APIs.

When we don't know the expected row count, why can't we(as callers of
the APIs) tell the APIs something like, "I'm intending to perform
multi inserts, so if possible and if you have a mechanism to buffer
the slots, do it, otherwise insert the tuples one by one, or else do
whatever you want to do with the tuples I give it you". So, in case of
COPY we can ask the API for multi inserts and call heap_insert_begin()
and heap_insert_v2().

Given the above explanation, I still feel bool is_multi would suffice.

Thoughts?

On dynamically, switching from single to multi inserts, this can be
done by heap_insert_v2 itself. The way I think it's possible is that,
say we have some threshold row count 1000(can be a macro)  after
inserting those many tuples, heap_insert_v2 can switch to buffering
mode.

Thoughts?

> Which would make the code something like:
>
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
> TupleTableSlot  *batchslot;
> HeapMultiInsertState *mistate = (HeapMultiInsertState 
> *)state->mistate;
> Size sz;
>
> Assert(mistate && mistate->slots);
>
> if (mistate->slots[mistate->cur_slots] == NULL)
> mistate->slots[mistate->cur_slots] =
> 
> table_slot_create(state->rel, NULL);
>
> batchslot = mistate->slots[mistate->cur_slots];
>
> ExecClearTuple(batchslot);
> ExecCopySlot(batchslot, slot);
>
> /*
>  * Calculate the tuple size after the original slot is copied, 
> because the
>  * copied slot type and the tuple size may change.
>  */
> sz = GetTupleSize(batchslot, mistate->max_size);
>
> Assert(sz > 0);
>
> mistate->cur_slots++;
> mistate->cur_size += sz;
>
> if (mistate->cur_slots >= mistate->max_slots ||
> mistate->cur_size >= mistate->max_size)
> heap_multi_insert_flush(state);
> }

I think clearing tuples before copying the slot as you suggested may
work without the need of clear_slots flag.

>
> >  > Also, why do we want to do ExecClearTuple() anyway? Isn't
> >  > it good enough that the next call to ExecCopySlot will effectively clear
> >  > it out?
> >
> > For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
> > slot before copying. But, for buffer heap slots, the
> > tts_buffer_heap_copyslot does not always clear the destination slot, see
> > 

Re: Track replica origin progress for Rollback Prepared

2021-01-06 Thread Amit Kapila
On Wed, Jan 6, 2021 at 5:18 PM Michael Paquier  wrote:
>
> On Tue, Jan 05, 2021 at 04:24:21PM +0530, Amit Kapila wrote:
> > There are already tests [1] in one of the upcoming patches for logical
> > decoding of 2PC which covers this code using which I have found this
> > problem. So, I thought those would be sufficient. I have not checked
> > the feasibility of using test_decoding because I thought adding more
> > using test_decoding will unnecessarily duplicate the tests.
>
> Hmm.  This stuff does not check after replication origins even if it
> stresses 2PC, so that looks incomplete when seen from here.
>

I think it does. Let me try to explain in a bit more detail.
Internally, the apply worker uses replication origins to track the
progress of apply, see the code near
ApplyWorkerMain->replorigin_create. We will store the progress (WAL
LSN) for each commit (prepared)/ rollback prepared with this origin.
If the server crashes and restarts, we will use the origin's LSN as
the start decoding point (the subscriber sends the last LSN to the
publisher). The bug here is that after restart the origin was not
advanced for rollback prepared which I have fixed with this patch.

Now, let us see how the tests mentioned by me cover this code. In the
first test (check that 2PC gets replicated to subscriber then ROLLBACK
PREPARED), we do below on publisher and wait for it to be applied on
the subscriber.
BEGIN;
INSERT INTO tab_full VALUES (12);
PREPARE TRANSACTION 'test_prepared_tab_full';
ROLLBACK PREPARED 'test_prepared_tab_full';

Note that we would have WAL logged the LSN (replication_origin_lsn)
corresponding to ROLLBACK PREPARED on the subscriber during apply.
Now, in the second test(Check that ROLLBACK PREPARED is decoded
properly on crash restart (publisher and subscriber crash)), we
prepare a transaction and crash the server. After the restart, because
we have not advanced the replication origin in the recovery of
Rollback Prepared, the subscriber won't consider that transaction has
been applied so it again requests that transaction.

Actually speaking, we don't need the second test to reproduce this
exact problem, if we would have restarted after the first test the
problem would be reproduced but I was consistent getting the problem
so with the current way tests are written. However, we can change it
slightly to restart after the first test if we want.

-- 
With Regards,
Amit Kapila.




Re: Track replica origin progress for Rollback Prepared

2021-01-06 Thread Michael Paquier
On Tue, Jan 05, 2021 at 04:24:21PM +0530, Amit Kapila wrote:
> There are already tests [1] in one of the upcoming patches for logical
> decoding of 2PC which covers this code using which I have found this
> problem. So, I thought those would be sufficient. I have not checked
> the feasibility of using test_decoding because I thought adding more
> using test_decoding will unnecessarily duplicate the tests.

Hmm.  This stuff does not check after replication origins even if it
stresses 2PC, so that looks incomplete when seen from here.

> How about something like "Dump transaction origin information only for
> abort prepared. We need this during recovery to update the replication
> origin progress."?

That sounds fine.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect allocation handling for cryptohash functions with OpenSSL

2021-01-06 Thread Michael Paquier
On Sat, Dec 19, 2020 at 03:13:50PM +0900, Michael Paquier wrote:
> At the same time, I have taken care of your comment from upthread to
> return a failure if the caller passes NULL for the context, and
> adjusted some comments.  What do you think of the attached?

I have looked again at this thread with a fresher mind and I did not
see a problem with the previous patch, except some indentation
issues.  So if there are no objections, I'd like to commit the
attached.
--
Michael
diff --git a/src/include/common/cryptohash.h b/src/include/common/cryptohash.h
index 4c3df8e5ae..3ecaf62113 100644
--- a/src/include/common/cryptohash.h
+++ b/src/include/common/cryptohash.h
@@ -25,12 +25,8 @@ typedef enum
 	PG_SHA512
 } pg_cryptohash_type;
 
-typedef struct pg_cryptohash_ctx
-{
-	pg_cryptohash_type type;
-	/* private area used by each hash implementation */
-	void	   *data;
-} pg_cryptohash_ctx;
+/* opaque context, private to each cryptohash implementation */
+typedef struct pg_cryptohash_ctx pg_cryptohash_ctx;
 
 extern pg_cryptohash_ctx *pg_cryptohash_create(pg_cryptohash_type type);
 extern int	pg_cryptohash_init(pg_cryptohash_ctx *ctx);
diff --git a/src/common/cryptohash.c b/src/common/cryptohash.c
index 1921a33b34..efedadd626 100644
--- a/src/common/cryptohash.c
+++ b/src/common/cryptohash.c
@@ -39,6 +39,21 @@
 #define FREE(ptr) free(ptr)
 #endif
 
+/* Internal pg_cryptohash_ctx structure */
+struct pg_cryptohash_ctx
+{
+	pg_cryptohash_type type;
+
+	union
+	{
+		pg_md5_ctx	md5;
+		pg_sha224_ctx sha224;
+		pg_sha256_ctx sha256;
+		pg_sha384_ctx sha384;
+		pg_sha512_ctx sha512;
+	}			data;
+};
+
 /*
  * pg_cryptohash_create
  *
@@ -50,38 +65,18 @@ pg_cryptohash_create(pg_cryptohash_type type)
 {
 	pg_cryptohash_ctx *ctx;
 
+	/*
+	 * Note that this always allocates enough space for the largest hash. A
+	 * smaller allocation would be enough for md5, sha224 and sha256, but the
+	 * small extra amount of memory does not make it worth complicating this
+	 * code.
+	 */
 	ctx = ALLOC(sizeof(pg_cryptohash_ctx));
 	if (ctx == NULL)
 		return NULL;
-
+	memset(ctx, 0, sizeof(pg_cryptohash_ctx));
 	ctx->type = type;
 
-	switch (type)
-	{
-		case PG_MD5:
-			ctx->data = ALLOC(sizeof(pg_md5_ctx));
-			break;
-		case PG_SHA224:
-			ctx->data = ALLOC(sizeof(pg_sha224_ctx));
-			break;
-		case PG_SHA256:
-			ctx->data = ALLOC(sizeof(pg_sha256_ctx));
-			break;
-		case PG_SHA384:
-			ctx->data = ALLOC(sizeof(pg_sha384_ctx));
-			break;
-		case PG_SHA512:
-			ctx->data = ALLOC(sizeof(pg_sha512_ctx));
-			break;
-	}
-
-	if (ctx->data == NULL)
-	{
-		explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
-		FREE(ctx);
-		return NULL;
-	}
-
 	return ctx;
 }
 
@@ -95,24 +90,24 @@ int
 pg_cryptohash_init(pg_cryptohash_ctx *ctx)
 {
 	if (ctx == NULL)
-		return 0;
+		return -1;
 
 	switch (ctx->type)
 	{
 		case PG_MD5:
-			pg_md5_init((pg_md5_ctx *) ctx->data);
+			pg_md5_init(>data.md5);
 			break;
 		case PG_SHA224:
-			pg_sha224_init((pg_sha224_ctx *) ctx->data);
+			pg_sha224_init(>data.sha224);
 			break;
 		case PG_SHA256:
-			pg_sha256_init((pg_sha256_ctx *) ctx->data);
+			pg_sha256_init(>data.sha256);
 			break;
 		case PG_SHA384:
-			pg_sha384_init((pg_sha384_ctx *) ctx->data);
+			pg_sha384_init(>data.sha384);
 			break;
 		case PG_SHA512:
-			pg_sha512_init((pg_sha512_ctx *) ctx->data);
+			pg_sha512_init(>data.sha512);
 			break;
 	}
 
@@ -123,30 +118,31 @@ pg_cryptohash_init(pg_cryptohash_ctx *ctx)
  * pg_cryptohash_update
  *
  * Update a hash context.  Note that this implementation is designed
- * to never fail, so this always returns 0.
+ * to never fail, so this always returns 0 except if the caller has
+ * given a NULL context.
  */
 int
 pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len)
 {
 	if (ctx == NULL)
-		return 0;
+		return -1;
 
 	switch (ctx->type)
 	{
 		case PG_MD5:
-			pg_md5_update((pg_md5_ctx *) ctx->data, data, len);
+			pg_md5_update(>data.md5, data, len);
 			break;
 		case PG_SHA224:
-			pg_sha224_update((pg_sha224_ctx *) ctx->data, data, len);
+			pg_sha224_update(>data.sha224, data, len);
 			break;
 		case PG_SHA256:
-			pg_sha256_update((pg_sha256_ctx *) ctx->data, data, len);
+			pg_sha256_update(>data.sha256, data, len);
 			break;
 		case PG_SHA384:
-			pg_sha384_update((pg_sha384_ctx *) ctx->data, data, len);
+			pg_sha384_update(>data.sha384, data, len);
 			break;
 		case PG_SHA512:
-			pg_sha512_update((pg_sha512_ctx *) ctx->data, data, len);
+			pg_sha512_update(>data.sha512, data, len);
 			break;
 	}
 
@@ -157,30 +153,31 @@ pg_cryptohash_update(pg_cryptohash_ctx *ctx, const uint8 *data, size_t len)
  * pg_cryptohash_final
  *
  * Finalize a hash context.  Note that this implementation is designed
- * to never fail, so this always returns 0.
+ * to never fail, so this always returns 0 except if the caller has
+ * given a NULL context.
  */
 int
 pg_cryptohash_final(pg_cryptohash_ctx *ctx, uint8 *dest)
 {
 	if (ctx == NULL)
-		return 0;
+		return -1;
 
 	switch 

Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS

2021-01-06 Thread Peter Eisentraut

On 2021-01-06 00:30, Craig Ringer wrote:
Perhaps debug_invalidate_system_caches_always ? It's a bit long but we 
use the debug prefix elsewhere too.


Committed with that name.

In your original patch, this hunk in pg_config_manual.h:

+ * You can define these by editing pg_config_manual.h but it's usually
+ * sufficient to pass CFLAGS to configure, e.g.
+ *
+ * ./configure --enable-debug --enable-cassert CFLAGS="-DUSE_VALGRIND"
+ *
+ * The flags are persisted in Makefile.global so they will be correctly
+ * applied to extensions, including those built by PGXS.

I don't think that necessarily works for all settings.  Maybe we should 
make a pass through it and ensure it all works sensibly, but that seems 
like a separate project.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Single transaction in the tablesync worker?

2021-01-06 Thread Amit Kapila
On Wed, Jan 6, 2021 at 2:13 PM Peter Smith  wrote:
>
> I think it makes sense. If there can be a race between the tablesync
> re-launching (after error), and the AlterSubscription_refresh removing
> some table’s relid from the subscription then there could be lurking
> slot/origin tablesync resources (of the removed table) which a
> subsequent DROP SUBSCRIPTION cannot discover. I will think more about
> how/if it is possible to make this happen. Anyway, I suppose I ought
> to refactor/isolate some of the tablesync cleanup code in case it
> needs to be commonly called from DropSubscription and/or from
> AlterSubscription_refresh.
>

Fair enough. BTW, I have analyzed whether we need any modifications to
pg_dump/restore for this patch as this changes the state of one of the
fields in the system table and concluded that we don't need any
change. For subscriptions, we don't dump any of the information from
pg_subscription_rel, rather we just dump subscriptions with the
connect option as false which means users need to enable the
subscription and refresh publication after restore. I have checked
this in the code and tested it as well. The related information is
present in pg_dump doc page [1], see from "When dumping logical
replication subscriptions ".

[1] - https://www.postgresql.org/docs/devel/app-pgdump.html

-- 
With Regards,
Amit Kapila.




RE: [Patch] Optimize dropping of relation buffers using dlist

2021-01-06 Thread k.jami...@fujitsu.com
On Sunday, January 3, 2021 10:35 PM (JST), Amit Kapila wrote:
> On Sat, Jan 2, 2021 at 7:47 PM k.jami...@fujitsu.com
>  wrote:
> >
> > Happy new year. The V38 LGTM.
> > Apologies for a bit of delay on posting the test results, but since
> > it's the start of commitfest, here it goes and the results were interesting.
> >
> > I executed a VACUUM test using the same approach that Tsunakawa-san
> > did in [1], but only this time, the total time that DropRelFileNodeBuffers()
> took.
> >
> 
> Please specify the exact steps like did you deleted all the rows from a table 
> or
> some of it or none before performing Vacuum? How did you measure this
> time, did you removed the cached check? It would be better if you share the
> scripts and or the exact steps so that the same can be used by others to
> reproduce.

Basically, I used the TimestampDifference function in DropRelFileNodeBuffers().
I also executed DELETE before VACUUM.
I also removed nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD
And used the threshold as the relation size.

> Hmm, in the tests done by Tang, the results indicate that in some cases the
> patched version is slower at even NBuffers/32, so not sure if we can go to
> values shown by you unless she is doing something wrong. I think the
> difference in results could be because both of you are using different
> techniques to measure the timings, so it might be better if both of you can
> share scripts or exact steps used to measure the time and other can use the
> same technique and see if we are getting consistent results.

Right, since we want consistent results, please disregard the approach that I 
did.
I will resume the test similar to Tang, because she also executed the original 
failover
test which I have been doing before.
To avoid confusion and to check if the results from mine and Tang are 
consistent,
I also did the recovery/failover test for VACUUM on single relation, which I 
will send
in a separate email after this.

Regards,
Kirk Jamison


RE: Enhance traceability of wal_level changes for backup management

2021-01-06 Thread osumi.takami...@fujitsu.com
Hello, Sawada-san


I'll continue the discussion of [2].
We talked about how to recognize the time or LSN
when/where wal_level is changed to 'none' there.

You said 
> The use case I imagined is that the user temporarily
> changes wal_level to 'none' from 'replica' or 'logical' to speed up loading 
> and
> changes back to the normal. In this case, the backups taken before the
> wal_level change cannot be used to restore the database to the point after the
> wal_level change. So I think backup management tools would want to
> recognize the time or LSN when/where wal_level is changed to ‘none’ in order
> to do some actions such as invalidating older backups, re-calculating backup
> redundancy etc.
> Actually the same is true when the user changes to ‘minimal’. The tools would
> need to recognize the time or LSN in this case too. Since temporarily changing
> wal_level has been an uncommon use case some tools perhaps are not aware
> of that yet. But since introducing wal_level = ’none’ could make the change
> common, I think we need to provide a way for the tools.

I wondered, couldn't backup management tools utilize the information
in the backup that is needed to be made when wal_level is changed to "none" for 
example ?

As I said before, existing backup management tools support
only wal_level=replica or logical at present. And, if they would wish to alter 
the
status quo and want to cover the changes over wal_levels, I felt it's natural 
that
they support feature like taking a full backup, trigged by the wal_level 
changes (or server stop).

This is because taking a backup is a must for wal_level=none,
as I described in the patch of wal_level=none.
For example, they could prepare an easy way to
take an offline physical backup when the server stops for changing the 
wal_level.
(Here, they can support the change to minimal, too.)

What did you think ?

[2] - 
https://www.postgresql.org/message-id/CAD21AoCotoAxxCmMVz6niwg4j6c3Er_-GboTLmHBft8pALpOGA%40mail.gmail.com

Best Regards,
Takamichi Osumi




Enhance traceability of wal_level changes for backup management

2021-01-06 Thread osumi.takami...@fujitsu.com
Hi,


This thread came from another thread about wal_level [1].

Mainly from backup management tools perspective
such as pgBackRest, EDB's BART and pg_probackup,
it seems worth talking about a way comprehensively
to trace and recognize wal_level changes for various purposes and values
like necessity of invalidating old backups for example.

In the thread [1], I talk about wal_level='none' but
these kind of topic applies changing wal_level to 'minimal'
from higher level too. Accordingly, I made this topic as a new independent 
thread.

Currently, these backup management tools described above
work when wal_level is higher than minimal
because these use physical online backup or wal archiving
but giving any useful ideas for backup management
related to wal_level changes is welcomed.

[1] - 
https://www.postgresql.org/message-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320%40TYAPR01MB2990.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi




  1   2   >