RE: Avoid CommandCounterIncrement in RI trigger when INSERT INTO referencing table

2021-03-21 Thread houzj.f...@fujitsu.com
Hi,

Thanks for the review.

> I noticed some things on the first scan through:
> 
> Patch 0001:
> 1) Tidy up the comments a bit:
> 
> Suggest the following update to part of the comments:

Changed.

> Patch 0002:
> 1) The new max_parallel_hazard_context member "pk_rels" is not being 
> set (to
> NIL) in the is_parallel_safe() function, so it will have a junk value 
> in that case - though it does look like nothing could reference it 
> then (but the issue may be detected by a Valgrind build, as performed by the 
> buildfarm).

Changed.

> 2) Few things to tidy up the patch comments:

Changed.

> 3) In target_rel_trigger_max_parallel_hazard(), you have added a 
> variable declaration "int trigtype;" after code, instead of before:

Changed.

Attaching new version patch with these changes.

Also attaching the simple performance test results (insert into table with 
foreign key):
postgres=# explain (analyze, verbose) insert into fk select a,func_xxx() from 
data where a%2=0 or a%3 = 0;
  workers  | serial insert + parallel select | parallel insert | performace gain
---+-+-+
2 workers | 85512.153ms | 61384.957ms | 29%
4 workers | 85436.957ms | 39335.797ms | 54%

-data prepare-
create table pk(a int primary key,b int);
create table fk(a int references pk,b int);
create table data(a int, b int);
insert into data select * from generate_series(1,1000,1) t;
 insert into pk select generate_series(1,1000,1);


Best regards,
houzj


v3-0001-skip-cci.patch
Description: v3-0001-skip-cci.patch


v3-0002-extend-safery-check-for-fk-relation.patch
Description: v3-0002-extend-safery-check-for-fk-relation.patch


Re: Permission failures with WAL files in 13~ on Windows

2021-03-21 Thread Michael Paquier
On Thu, Mar 18, 2021 at 12:01:40PM +0900, Michael Paquier wrote:
> Yep, that's actually something I wrote for my own setups, with
> log_checkpoints enabled to catch all concurrent checkpoint activity
> and some LOGs.  Still no luck unfortunately :(

The various reporters had more luck than myself in reproducing the
issue, so I have applied 909b449e to address the issue.  I am pretty
sure that we should review more this business in the future, but I'd
rather not touch the stable branches.
--
Michael


signature.asc
Description: PGP signature


Re: Can we remove extra memset in BloomInitPage, GinInitPage and SpGistInitPage when we have it in PageInit?

2021-03-21 Thread Mahendra Singh Thalor
On Mon, 22 Mar 2021 at 10:16, Bharath Rupireddy
 wrote:
>
> Hi,
>
> We are memset-ting the special space page that's already set to zeros
> by PageInit in BloomInitPage, GinInitPage and SpGistInitPage. We have
> already removed the memset after PageInit in gistinitpage (see the
> comment there).  Unless I'm missing something, IMO they are redundant.
> I'm attaching a small patch that gets rid of the extra memset calls.
>
>
> While on it, I removed MAXALIGN(sizeof(SpGistPageOpaqueData)) in
> SpGistInitPage because the PageInit will anyways align the
> specialSize. This change is inline with other places (such as
> BloomInitPage, brin_page_init GinInitPage, gistinitpage,
> _hash_pageinit and so on) where we just pass the size of special space
> data structure.
>
> I didn't see any regression test failure on my dev system with the
> attached patch.
>
>
> Thoughts?

Your changes look to fine me and I am also not getting any failure. I
think we should back-patch all the branches.

Patch is applying to all the branches(till v95) and there is no failure.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2021-03-21 Thread Fujii Masao



On 2021/03/22 14:03, shinya11.k...@nttdata.com wrote:

Barring any objection, I will commit this.


I think it's good except for a typo "thoes four bits"


Thanks for the review! Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 610f65e471..f8b8afe4a7 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -438,6 +438,15 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats 
*stats,
 
recid = XLogRecGetInfo(record) >> 4;
 
+   /*
+* XACT records need to be handled differently. Those records use the
+* first bit of those four bits for an optional flag variable and the
+* following three bits for the opcode. We filter opcode out of xl_info
+* and use it as the identifier of the record.
+*/
+   if (rmid == RM_XACT_ID)
+   recid &= 0x07;
+
stats->record_stats[rmid][recid].count++;
stats->record_stats[rmid][recid].rec_len += rec_len;
stats->record_stats[rmid][recid].fpi_len += fpi_len;


Re: [HACKERS] make async slave to wait for lsn to be replayed

2021-03-21 Thread Kyotaro Horiguchi
At Thu, 18 Mar 2021 18:57:15 +0500, Ibrar Ahmed  wrote 
in 
> On Thu, Jan 21, 2021 at 1:30 PM Kyotaro Horiguchi 
> wrote:
> 
> > Hello.
> >
> > At Wed, 18 Nov 2020 15:05:00 +0300, a.pervush...@postgrespro.ru wrote in
> > > I've changed the BEGIN WAIT FOR LSN statement to core functions
> > > pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> > > Currently the functions work inside repeatable read transactions, but
> > > waitlsn creates a snapshot if called first in a transaction block,
> > > which can possibly lead the transaction to working incorrectly, so the
> > > function gives a warning.
> >
> > According to the discuttion here, implementing as functions is not
> > optimal. As a Poc, I made it as a procedure. However I'm not sure it
> > is the correct implement as a native procedure but it seems working as
> > expected.
> >
> > > Usage examples
> > > ==
> > > select pg_waitlsn(‘LSN’, timeout);
> > > select pg_waitlsn_infinite(‘LSN’);
> > > select pg_waitlsn_no_wait(‘LSN’);
> >
> > The first and second usage is coverd by a single procedure. The last
> > function is equivalent to pg_last_wal_replay_lsn(). As the result, the
> > following procedure is provided in the attached.
> >
> > pg_waitlsn(wait_lsn pg_lsn, timeout integer DEFAULT -1)
> >
> > Any opinions mainly compared to implementation as a command?
> >
> > regards.
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> >
> 
> The patch (pg_waitlsn_v10_2_kh.patch) does not compile successfully and has
> compilation errors. Can you please take a look?
> 
> https://cirrus-ci.com/task/6241565996744704
> 
> xlog.c:45:10: fatal error: commands/wait.h: No such file or directory
> #include "commands/wait.h"
> ^
> compilation terminated.
> make[4]: *** [: xlog.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> make[3]: *** [../../../src/backend/common.mk:39: transam-recursive] Error 2
> make[2]: *** [common.mk:39: access-recursive] Error 2
> make[1]: *** [Makefile:42: all-backend-recurse] Error 2
> make: *** [GNUmakefile:11: all-src-recurse] Error 2
> 
> I am changing the status to  "Waiting on Author"

Anna is the autor.  The "patch" was just to show how we can implement
the feature as a procedure. (Sorry for the bad mistake I made.)

The patch still applies to the master. So I resend just rebased
version as v10_2, and attached the "PoC" as *.txt which applies on top
of the patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f8810e149..3c580083dd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -7535,6 +7536,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If we replayed an LSN that someone was waiting for,
+ * set latches in shared memory array to notify the waiter.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitedLSN())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..2c0bd41336 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -60,6 +60,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..1f2483672b
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * wait.c
+ *	  Implements waitlsn, which allows waiting for events such as
+ *	  LSN having been replayed on replica.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 2020, Regents of PostgresPro
+ *
+ * IDENTIFICATION
+ *	  src/backend/commands/wait.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlogdefs.h"
+#include "commands/wait.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "storage/backendid.h"
+#include "storage/pmsignal.h"
+#include "storage/proc.h"
+#include "storage/shmem.h"
+#include "storage/sinvaladt.h"
+#include "storage/spin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/timestamp.h"
+
+/* Add to shared memory array */
+static void AddWaitedLSN(XLogRecPtr lsn_to_wait);
+
+/* Shared memory structure */

RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2021-03-21 Thread Shinya11.Kato
>-Original Message-
>From: Fujii Masao 
>Sent: Monday, March 22, 2021 11:22 AM
>To: shinya11.k...@nttdata.com; da...@pgmasters.net; movead...@highgo.ca
>Cc: pgsql-hack...@postgresql.org; and...@anarazel.de; mich...@paquier.xyz;
>ahsan.h...@highgo.ca; horikyota@gmail.com
>Subject: Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.
>
>
>
>On 2021/03/19 18:27, Fujii Masao wrote:
>>
>>
>> On 2021/03/19 15:06, shinya11.k...@nttdata.com wrote:
>> But 0 value maybe looks strange, so in current version I show it like
>>below:
>> Type N (%) Record size (%) FPI size (%) Combined size (%)
>>  - --- --- ---  --- - --- ...
>> XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
>> Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)
>
> I just wanted to know why the "count" and "fpi_len" fields 0 are.
> So, it would be nice to have 0 values. Sorry for confusing you.

 Kato, it's not clear to me if you were asking for - to be changed back to 
 0?

 You marked the patch as Ready for Committer so I assume not, but it
 would be better to say clearly that you think this patch is ready for a
>committer to look at.
>>>
>>> Yes, I don't think it needs to be changed back to 0.
>>> I think this patch is ready for a committer to look at.
>>
>> What's the use case of this feature? I read through this thread
>> briefly, but I'm still not sure how useful this feature is.
>>
>> Horiguchi-san reported one issue upthread; --stats=record shows two
>> lines for Transaction/COMMIT record. Probably this should be fixed
>> separately.
>>
>> Horiguchi-san,
>> Do you have updated version of that bug-fix patch?
>> Or you started another thread for that issue?
>
>I confirmed that only XACT records need to be processed differently.
>So the patch that Horiguchi-san posted upthread looks good and enough to me.
>I added a bit more detail information into the comment in the patch.
>Attached is the updated version of the patch. Since this issue looks like a 
>bug,
>I'm thinking to back-patch that. Thought?
>
>Barring any objection, I will commit this.

I think it's good except for a typo "thoes four bits"

Regards,
Shinya Kato





Re: Type of wait events WalReceiverWaitStart and WalSenderWaitForWAL

2021-03-21 Thread Fujii Masao



On 2021/03/22 12:01, Kyotaro Horiguchi wrote:

WAIT_EVENT_WAL_RECEIVER_WAIT_START is waiting for waiting for starup
process to kick me.  So it may be either IPC or Activity.  Since
walreceiver hasn't sent anything to startup, so it's activity, rather
than IPC.  However, the behavior can be said that it convey a piece of
information from startup to wal receiver so it also can be said to be
an IPC. (That is the reason why I don't object for IPC.)


IMO this should be IPC because walreceiver is mainly waiting for the
interaction with the startup process, during this wait event. Since
you can
live with IPC, probably our consensus is to use IPC?


Exactly.


Ok, so barring any objection, I will commit the patch that I posted upthread.



Mmm. I agree that it waits for WAL in most cases, but still WAL-wait
is activity for me because it is not waiting for being cued by
someone, but waiting for new WAL to come to perform its main purpose.
If it's an IPC, all waits on other than pure sleep should fall into
IPC?  (I was confused by the comment of WalSndWait, which doesn't
state that it is waiting for latch..)

Other point I'd like to raise is that the client_wait case should be
distinctive from the WAL-wait since it is significant sign of what is
happening.

So I propose two chagnes here.

a. Rewrite the comment of WalSndWait so that it states that "also
   waiting for latch-set".


+1



b. Split the event to two different events.

-   WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_MAIN);
+   WalSndWait(wakeEvents, sleeptime,
+  pq_is_send_pending() ? WAIT_EVENT_WAL_SENDER_WRITE_DATA:
+ WAIT_EVENT_WAL_SENDER_MAIN);

And _WRITE_DATA as client_wait and _SENDER_MAIN as activity.

What  do you think about this?


I'm ok with this. What about the attached patch (WalSenderWriteData.patch)?


Yes. The WAIT_EVENT_WAL_SENDER_WAIT_WAL is equivalent to
WAIT_EVENT_WAL_SENDER_MAIN as function.  So I think it should be in
the same category as WAIT_EVENT_WAL_SENDER_MAIN. And like the 1 above,
wait_client case should be distinctive from the _MAIN event.


+1. What about the attached patch (WalSenderWaitForWAL.patch)?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 19540206f9..03f440aa25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1177,8 +1177,8 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  
  
   WalSenderWriteData
-  Waiting for any activity when processing replies from WAL
-   receiver in WAL sender process.
+  Waiting to write WAL data to WAL receiver in
+   WAL sender process.
  
 

diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 23baa4498a..acaec753c1 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -3115,15 +3115,22 @@ WalSndWakeup(void)
 }
 
 /*
- * Wait for readiness on the FeBe socket, or a timeout.  The mask should be
- * composed of optional WL_SOCKET_WRITEABLE and WL_SOCKET_READABLE flags.  Exit
- * on postmaster death.
+ * Wait for readiness on the FeBe socket, a timeout, or the latch to be set.
+ * The mask should be composed of optional WL_SOCKET_WRITEABLE and
+ * WL_SOCKET_READABLE flags.  Exit on postmaster death.
+ *
+ * Overwrite wait_event with WAIT_EVENT_WAL_SENDER_WRITE_DATA
+ * if we have pending data in the output buffer and are waiting to write
+ * data to a client.
  */
 static void
 WalSndWait(uint32 socket_events, long timeout, uint32 wait_event)
 {
WaitEvent event;
 
+   if (socket_events & WL_SOCKET_WRITEABLE)
+   wait_event = WAIT_EVENT_WAL_SENDER_WRITE_DATA;
+
ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, socket_events, NULL);
if (WaitEventSetWait(FeBeWaitSet, timeout, , 1, wait_event) == 1 
&&
(event.events & WL_POSTMASTER_DEATH))
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 03f440aa25..02f3aa29db 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1105,7 +1105,13 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  
  
   WalSenderMain
-  Waiting in main loop of WAL sender process.
+  Waiting in main loop of (physical replication) WAL sender
+   process.
+ 
+ 
+  WalSenderWaitForWAL
+  Waiting for WAL to be flushed in main loop of logical
+   replication WAL sender process.
  
  
   WalWriterMain
@@ -1171,10 +1177,6 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
   SSLOpenServer
   Waiting for SSL while attempting connection.
  
- 
-  WalSenderWaitForWAL
-  Waiting for WAL to be flushed in WAL sender process.
-  

Can we remove extra memset in BloomInitPage, GinInitPage and SpGistInitPage when we have it in PageInit?

2021-03-21 Thread Bharath Rupireddy
Hi,

We are memset-ting the special space page that's already set to zeros
by PageInit in BloomInitPage, GinInitPage and SpGistInitPage. We have
already removed the memset after PageInit in gistinitpage (see the
comment there).  Unless I'm missing something, IMO they are redundant.
I'm attaching a small patch that gets rid of the extra memset calls.

While on it, I removed MAXALIGN(sizeof(SpGistPageOpaqueData)) in
SpGistInitPage because the PageInit will anyways align the
specialSize. This change is inline with other places (such as
BloomInitPage, brin_page_init GinInitPage, gistinitpage,
_hash_pageinit and so on) where we just pass the size of special space
data structure.

I didn't see any regression test failure on my dev system with the
attached patch.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Remove-extra-memset-in-BloomInitPage-GinInitPage-.patch
Description: Binary data


Re: Replication slot stats misgivings

2021-03-21 Thread Masahiko Sawada
On Sat, Mar 20, 2021 at 3:52 AM Andres Freund  wrote:
>
> - If max_replication_slots was lowered between a restart,
>   pgstat_read_statfile() will happily write beyond the end of
>   replSlotStats.

I think we cannot restart the server after lowering
max_replication_slots to a value less than the number of replication
slots actually created on the server. No?

>
> - pgstat_reset_replslot_counter() acquires ReplicationSlotControlLock. I
>   think pgstat.c has absolutely no business doing things on that level.

Agreed.

>
> - PgStat_ReplSlotStats etc use slotname[NAMEDATALEN]. Why not just NameData?

That's because we followed other definitions in pgstat.h that use
char[NAMEDATALEN]. I'm okay with using NameData.

>
> - pgstat_report_replslot() already has a lot of stats parameters, it
>   seems likely that we'll get more. Seems like we should just use a
>   struct of stats updates.

Agreed.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Disable WAL logging to speed up data loading

2021-03-21 Thread tsunakawa.ta...@fujitsu.com
From: Stephen Frost 
> The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> inside the transaction before starting the 'COPY' command is 'too hard'.


> I could be wrong and perhaps opinions will change in the future, but it really
> doesn't seem like the there's much support for adding a new WAL level just to
> avoid doing the TRUNCATE.  Appealing to what other database systems
> support can be helpful in some cases but that's usually when we're looking at 
> a
> new capability which multiple other systems have implemented.  This isn't
> actually a new capability at all- the WAL level that you're looking for 
> already
> exists and already gives the optimization you're looking for, as long as you 
> issue
> a TRUNCATE at the start of the transaction.

No, we can't ask using TRUNCATE because the user wants to add data to a table.


Regards
TakayukiTsunakawa






Re: proposal - psql - use pager for \watch command

2021-03-21 Thread Pavel Stehule
po 22. 3. 2021 v 4:55 odesílatel Thomas Munro 
napsal:

> On Sun, Mar 21, 2021 at 10:38 PM Pavel Stehule 
> wrote:
> > Can somebody help me (with access on macos0 with debugging this issue?
>
> I'll try to figure it out, but maybe after the code freeze.  I started
> my programming career writing curses software a million years ago on a
> couple of extinct Unixes... I might even be able to remember how it
> works.  This is not a problem for committing the PSQL_WATCH_PAGER
> patch, I just mentioned it here because I thought that others might
> try it out on a Mac and be confused.
>

Thank you.

probably there will not be an issue inside ncurses - the most complex part
of get_event is polling of input sources - tty and some other. The pspg
should not to stop there on tty reading.

Regards

Pavel


Re: proposal - psql - use pager for \watch command

2021-03-21 Thread Thomas Munro
On Sun, Mar 21, 2021 at 10:38 PM Pavel Stehule  wrote:
> Can somebody help me (with access on macos0 with debugging this issue?

I'll try to figure it out, but maybe after the code freeze.  I started
my programming career writing curses software a million years ago on a
couple of extinct Unixes... I might even be able to remember how it
works.  This is not a problem for committing the PSQL_WATCH_PAGER
patch, I just mentioned it here because I thought that others might
try it out on a Mac and be confused.




RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-21 Thread houzj.f...@fujitsu.com
> Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch 
> have been committed, I have now rebased and attached the rest of the original 
> patchset, 
> which includes:
>- Additional tests for "Parallel SELECT for INSERT"
>- Parallel INSERT functionality
>- Test and documentation updates for Parallel INSERT
Hi,

I noticed that some comments may need updated since we introduced parallel 
insert in this patch.

1) src/backend/executor/execMain.c
 * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
 * would require (a) storing the combocid hash in shared memory, rather
 * than synchronizing it just once at the start of parallelism, and (b) 
an
 * alternative to heap_update()'s reliance on xmax for mutual exclusion.
 * INSERT may have no such troubles, but we forbid it to simplify the
 * checks.

As we will allow INSERT in parallel mode, we'd better change the comment here.

2) src/backend/storage/lmgr/README
dangers are modest.  The leader and worker share the same transaction,
snapshot, and combo CID hash, and neither can perform any DDL or, indeed,
write any data at all.  Thus, for either to read a table locked exclusively by

The same as 1), parallel insert is the exception.

3) src/backend/storage/lmgr/README
mutual exclusion method for such cases.  Currently, the parallel mode is
strictly read-only, but now we have the infrastructure to allow parallel
inserts and parallel copy.

May be we can say:
+mutual exclusion method for such cases.  Currently, we only allowed parallel
+inserts, but we already have the infrastructure to allow parallel copy.


Best regards,
houzj




Re: a verbose option for autovacuum

2021-03-21 Thread Masahiko Sawada
On Sat, Mar 20, 2021 at 3:40 PM Michael Paquier  wrote:
>
> On Sat, Mar 20, 2021 at 01:06:51PM +0900, Masahiko Sawada wrote:
> > It's not bad but it seems redundant a bit to me. We pass the idx in
> > spite of passing also Irel[idx] and &(vacrelstats->indstats[idx]). I
> > think your first idea that is done in v4 patch (saving index names at
> > the beginning of heap_vacuum_rel() for autovacuum logging purpose
> > only) and the idea of deferring to close indexes until the end of
> > heap_vacuum_rel() so that we can refer index name at autovacuum
> > logging are more simple.
>
> Okay.
>
> > We need to initialize *stats with NULL here.
>
> Right.  I am wondering why I did not get any complain here.
>
> > If shared_indstats is NULL (e.g., we do " stats =
> > vacrelstats->indstats[indnum];"), vacrelstats->indstats[indnum] is not
> > updated since we pass  I think we should pass
> > &(vacrelstats->indstats[indnum]) instead in this case.
>
> If we get rid completely of this idea around indnum, that I don't
> disagree with so let's keep just indname, you mean to keep the second
> argument IndexBulkDeleteResult of vacuum_one_index() and pass down
> &(vacrelstats->indstats[indnum]) as argument.  No objections from me
> to just do that.
>
> > Previously, we update the element of the pointer array of index
> > statistics to the pointer pointing to either the local memory or DSM.
> > But with the above change, we do that only when the index statistics
> > are in the local memory. In other words, vacrelstats->indstats[i] is
> > never updated if the corresponding index supports parallel indexes. I
> > think this is not relevant with the change that we'd like to do here
> > (i.e., passing indnum down).
>
> Yeah, that looks like just some over-engineering design on my side.
> Would you like to update the patch with what you think is most
> adapted?

I've updated the patch. I saved the index names at the beginning of
heap_vacuum_rel() for autovacuum logging, and add indstats and
nindexes to LVRelStats. Some functions still have nindexes as a
function argument but it seems to make sense since it corresponds the
list of index relations (*Irel). Please review the patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v6-index_stats_log.patch
Description: Binary data


Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax

2021-03-21 Thread Bharath Rupireddy
On Sun, Mar 7, 2021 at 7:21 PM Japin Li  wrote:
> Thank you point out this.  Fixed it in v7 patch set.
>
> Please consider the v7 patch for futher review.

Thanks for the patches. I just found the following behaviour with the
new ADD/DROP syntax: when the specified publication list has
duplicates, the patch is throwing "publication is already present"
error. It's adding the first instance of the duplicate into the list
and the second instance is being checked in the added list and
throwing the "already present error". The error message means that the
publication is already present in the subscription but it's not true.
See my testing at [1].

I think we have two cases:
case 1: the publication/s specified in the new ADD/DROP syntax may/may
not have already been associated with the subscription, so the error
"publication is already present"/"publication doesn't exist" error
makes sense.
case 2: there can be duplicate publications specified in the new
ADD/DROP syntax, in this case the error "publication name "mypub2"
used more than once" makes more sense much like [2].

[1]
postgres=# select subpublications from pg_subscription;
 subpublications
-
 {mypub,mypub1}

postgres=# alter subscription mysub add publication mypub2, mypub2;
ERROR:  publication "mypub2" is already present in the subscription

postgres=# select subpublications from pg_subscription;
subpublications
---
 {mypub,mypub1,mypub2}

postgres=# alter subscription mysub drop publication mypub2, mypub2;
ERROR:  publication "mypub2" doesn't exist in the subscription

[2]
postgres=# alter subscription mysub set publication mypub2, mypub2;
ERROR:  publication name "mypub2" used more than once

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] pgbench: improve \sleep meta command

2021-03-21 Thread Fujii Masao




On 2021/03/19 10:43, Fujii Masao wrote:



On 2021/03/19 10:02, kuroda.hay...@fujitsu.com wrote:

Dear Fujii-san,

I confirmed your patch and some parse functions, and I agree
the check condition in evaluateSleep() is correct.
No problem is found.


Thanks for reviewing the patch!
Barring any objection, I will commit this patch.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: shared-memory based stats collector

2021-03-21 Thread Kyotaro Horiguchi
At Mon, 22 Mar 2021 09:55:59 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 18 Mar 2021 01:47:20 -0700, Andres Freund  wrote 
> in 
> > Hi,
> > 
> > On 2021-03-18 16:56:02 +0900, Kyotaro Horiguchi wrote:
> > > At Tue, 16 Mar 2021 10:27:55 +0900 (JST), Kyotaro Horiguchi 
> > >  wrote in 
> > > Rebased and fixed two bugs.  Not addressed received comments in this
> > > version.
> > 
> > Since I am heavily editing the code, could you submit "functional"
> > changes (as opposed to fixing rebase issues) as incremental patches?
> 
> Oh.. please wait for.. a moment, maybe.

This is that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c09fa026b9..3f546afe6a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -1175,13 +1175,6 @@ pgstat_report_stat(bool force)
if (area == NULL)
return 0;
 
-   /*
-* We need a database entry if the following stats exists.
-*/
-   if (pgStatXactCommit > 0 || pgStatXactRollback > 0 ||
-   pgStatBlockReadTime > 0 || pgStatBlockWriteTime > 0)
-   get_local_dbstat_entry(MyDatabaseId);
-
/* Don't expend a clock check if nothing to do */
if (pgStatLocalHash == NULL && have_slrustats && !walstats_pending())
return 0;
@@ -1221,8 +1214,6 @@ pgstat_report_stat(bool force)
{
int remains = 0;
pgstat_localhash_iterator i;
-   List   *dbentlist = NIL;
-   ListCell   *lc;
PgStatLocalHashEntry *lent;
 
/* Step 1: flush out other than database stats */
@@ -1234,13 +1225,11 @@ pgstat_report_stat(bool force)
switch (lent->key.type)
{
case PGSTAT_TYPE_DB:
-
/*
 * flush_tabstat applies some of stats 
numbers of flushed
-* entries into local database stats. 
Just remember the
-* database entries for now then 
flush-out them later.
+* entries into local and shared 
database stats. Treat them
+* separately later.
 */
-   dbentlist = lappend(dbentlist, lent);
break;
case PGSTAT_TYPE_TABLE:
if (flush_tabstat(lent, nowait))
@@ -1268,9 +1257,11 @@ pgstat_report_stat(bool force)
}
 
/* Step 2: flush out database stats */
-   foreach(lc, dbentlist)
+   pgstat_localhash_start_iterate(pgStatLocalHash, );
+   while ((lent = pgstat_localhash_iterate(pgStatLocalHash, )) 
!= NULL)
{
-   PgStatLocalHashEntry *lent = (PgStatLocalHashEntry *) 
lfirst(lc);
+   /* no other types of entry must be found here */
+   Assert(lent->key.type == PGSTAT_TYPE_DB);
 
if (flush_dbstat(lent, nowait))
{
@@ -1281,8 +1272,6 @@ pgstat_report_stat(bool force)
pgstat_localhash_delete(pgStatLocalHash, 
lent->key);
}
}
-   list_free(dbentlist);
-   dbentlist = NULL;
 
if (remains <= 0)
{
@@ -1507,7 +1496,7 @@ flush_dbstat(PgStatLocalHashEntry *ent, bool nowait)
 
Assert(ent->key.type == PGSTAT_TYPE_DB);
 
-   localent = (PgStat_StatDBEntry *) >body;
+   localent = (PgStat_StatDBEntry *) ent->body;
 
/* find shared database stats entry corresponding to the local entry */
sharedent = (PgStat_StatDBEntry *)
@@ -3215,11 +3204,8 @@ pgstat_fetch_stat_dbentry(Oid dbid)
/* should be called from backends */
Assert(IsUnderPostmaster);
 
-   /* the simple cache doesn't work properly for InvalidOid */
-   Assert(dbid != InvalidOid);
-
/* Return cached result if it is valid. */
-   if (cached_dbent_key.databaseid == dbid)
+   if (dbid != 0 && cached_dbent_key.databaseid == dbid)
return _dbent;
 
shent = (PgStat_StatDBEntry *)


Re: Type of wait events WalReceiverWaitStart and WalSenderWaitForWAL

2021-03-21 Thread Kyotaro Horiguchi
At Thu, 18 Mar 2021 18:48:50 +0900, Fujii Masao  
wrote in 
> On 2021/03/17 15:31, Kyotaro Horiguchi wrote:
> > I think it'd be better that they are categorized by what it is waiting
> > for.
> 
> Yes. And some processes can be waiting for several events at the same
> moment. In this case we should pick the event that those proceses
> *mainly* are waiing for, as a wait event, I think.

Right.

> > Activity is waiting for something gating me to be released.
> > IPC is waiting for the response for a request previously sent to
> > another process.
> > Wait-client is waiting for the peer over a network connection to allow
> > me to proceed activity.
> 
> I'm not sure if these definitions are really right or not because they
> seem to be slightly different from those in the document.

Maybe it depends on what "main processing loop" means. I found my
words are inaccurate. "something gating me" meant that the main
work. In the case of walsender main loop, it's advance of WAL flush
location.  In a broader idea it is a kind of IPC in most cases but the
difference is, as you daid, in what the wait is waiting in those
cases.

> > So whether the three fall into the same category or not doesn't matter
> > to me.
> 
> Understood.
> 
> 
> > WAIT_EVENT_WAL_RECEIVER_MAIN(WalReceiverMain) is waiting for new data
> > to arrive.  This looks like an activity to me.
> 
> +1. So our consensus is not to change the category of this event.

Agreed.

> > WAIT_EVENT_WAL_RECEIVER_WAIT_START is waiting for waiting for starup
> > process to kick me.  So it may be either IPC or Activity.  Since
> > walreceiver hasn't sent anything to startup, so it's activity, rather
> > than IPC.  However, the behavior can be said that it convey a piece of
> > information from startup to wal receiver so it also can be said to be
> > an IPC. (That is the reason why I don't object for IPC.)
> 
> IMO this should be IPC because walreceiver is mainly waiting for the
> interaction with the startup process, during this wait event. Since
> you can
> live with IPC, probably our consensus is to use IPC?

Exactly.

> > 1(WAIT_EVENT_WAL_SENDER_MAIN, currently an activity) is waiting for
> > something to happen on the connection to the peer
> > receiver/worker. This might either be an activity or an wait_client,
> > but I prefer it to be wait_client, as the same behavior of a client
> > backend is categorizes as wait_client.
> 
> Yes, walsender is waiting for replies from the standby to arrive
> during
> this event. But I think that it's *mainly* waiting for WAL to be
> flushed
> in order to send it. So IPC is better for this event rather than
> Client.
> On the other hand, wait events reported in main loop are basically
> categorized in Activity, in other processes. So in the sake of
> consistency,
> I like Activity rather than IPC, for this event.

Mmm. I agree that it waits for WAL in most cases, but still WAL-wait
is activity for me because it is not waiting for being cued by
someone, but waiting for new WAL to come to perform its main purpose.
If it's an IPC, all waits on other than pure sleep should fall into
IPC?  (I was confused by the comment of WalSndWait, which doesn't
state that it is waiting for latch..)

Other point I'd like to raise is that the client_wait case should be
distinctive from the WAL-wait since it is significant sign of what is
happening.

So I propose two chagnes here.

a. Rewrite the comment of WalSndWait so that it states that "also
  waiting for latch-set".

b. Split the event to two different events.

-   WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_MAIN);
+   WalSndWait(wakeEvents, sleeptime,
+  pq_is_send_pending() ? WAIT_EVENT_WAL_SENDER_WRITE_DATA:
+ WAIT_EVENT_WAL_SENDER_MAIN);

And _WRITE_DATA as client_wait and _SENDER_MAIN as activity.

What  do you think about this?

> > 2 (WAIT_EVENT_WAL_SENDER_WRITE_DATA, currently a wait_client) is the
> > same to 1.
> 
> IIUC walsender is mainly waiting for the socket to be writeable, to
> send
> any pending data. So I agree to use Client for this event. Our
> consensus
> seems not to change the category of this event.

Right.

> > 3 (WAIT_EVENT_WAL_SENDER_WAIT_WAL, currently a wait_client) is the
> > same to 1.
> 
> Yes, walsender is waiting for replies from the standby to arrive
> during
> this event. But I think that it's *mainly* waiting for WAL to be
> flushed
> in order to send it. So IPC is better for this event rather than
> Client.
>
> On the other hand, while the server is in idle, this event is
reported
> for
> logical walsender. This makes me think that it might be Activity,
> i.e.,
> we should treat this as the wait event in logical walsender's main
> loop.
> So I like Activity rather than IPC, for this event.
> If we do this, it might be better to rename the event to
> WAIT_EVENT_LOGICAL_SENDER_MAIN.

Yes. The WAIT_EVENT_WAL_SENDER_WAIT_WAL is equivalent to
WAIT_EVENT_WAL_SENDER_MAIN as function.  So I think it 

Re: Replication slot stats misgivings

2021-03-21 Thread Amit Kapila
On Mon, Mar 22, 2021 at 3:10 AM Andres Freund  wrote:
>
> On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:
> > On Sun, Mar 21, 2021 at 2:57 AM Andres Freund  wrote:
> > > On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
> > > > On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila  
> > > > wrote:
> > > > > This idea is worth exploring to address the complaints but what do we
> > > > > do when we detect that the stats are from the different slot? It has
> > > > > mixed of stats from the old and new slot. We need to probably reset it
> > > > > after we detect that.
> > > > >
> > > >
> > > > What if the user created a slot with the same name after dropping the
> > > > slot and it has used the same index. I think chances are less but
> > > > still a possibility, but maybe that is okay.
> > > >
> > > > > What if after some frequency (say whenever we
> > > > > run out of indexes) we check whether the slots we are maintaining is
> > > > > pgstat.c have some stale slot entry (entry exists but the actual slot
> > > > > is dropped)?
> > > > >
> > > >
> > > > A similar drawback (the user created a slot with the same name after
> > > > dropping it) exists with this as well.
> > >
> > > pgstat_report_replslot_drop() already prevents that, no?
> > >
> >
> > Yeah, normally it would prevent that but what if a drop message is lost?
>
> That already exists as a danger, no? pgstat_recv_replslot() uses
> pgstat_replslot_index() to find the slot by name. So if a drop message
> is lost we'd potentially accumulate into stats of an older slot.  It'd
> probably a lower risk with what I suggested, because the initial stat
> report slot.c would use something like pgstat_report_replslot_create(),
> which the stats collector can use to reset the stats to 0?
>

okay, but I guess if we miss the create message as well then we will
have a similar danger. I think the benefit your idea will bring is to
use index-based lookup instead of name-based lookup. IIRC, we have
initially used the name here because we thought there is nothing like
OID for slots but your suggestion of using
ReplicationSlotCtl->replication_slots can address that.

> If we do it right the lossiness will be removed via shared memory stats
> patch...
>

Okay.

-- 
With Regards,
Amit Kapila.




Re: Why logical replication lancher exits 1?

2021-03-21 Thread Tatsuo Ishii
> On Mon, Mar 22, 2021 at 1:11 PM Tatsuo Ishii  wrote:
>> It seems only logical replication launcher exited with exit code 1
>> when it received shutdown request. Why?
> 
> FWIW here's an earlier discussion of that topic:
> 
> https://www.postgresql.org/message-id/flat/CAEepm%3D1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA%40mail.gmail.com

Thank you for pointing it out. I will look into the discussion.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Columns correlation and adaptive query optimization

2021-03-21 Thread Yugo NAGATA
On Fri, 19 Mar 2021 19:58:27 +0300
Konstantin Knizhnik  wrote:

> 
> 
> On 19.03.2021 12:17, Yugo NAGATA wrote:
> > On Wed, 10 Mar 2021 03:00:25 +0100
> > Tomas Vondra  wrote:
> >
> >> What is being proposed here - an extension suggesting which statistics
> >> to create (and possibly creating them automatically) is certainly
> >> useful, but I'm not sure I'd call it "adaptive query optimization". I
> >> think "adaptive" means the extension directly modifies the estimates
> >> based on past executions. So I propose calling it maybe "statistics
> >> advisor" or something like that.
> > I am also agree with the idea to implement this feature as a new
> > extension for statistics advisor.
> >
> >> BTW Why is "qual" in
> >>
> >>static void
> >>AddMultiColumnStatisticsForQual(void* qual, ExplainState *es)
> >>
> >> declared as "void *"? Shouldn't that be "List *"?
> > When I tested this extension using TPC-H queries, it raised segmentation
> > fault in this function. I think the cause would be around this argument.
> >
> > Regards,
> > Yugo Nagata
> >
> Attached please find new version of the patch with 
> AddMultiColumnStatisticsForQual parameter type fix and one more fix 
> related with handling synthetic attributes.
> I can not reproduce the crash on TPC-H queries, so if the problem 
> persists, can you please send me stack trace and may be some other 
> information helping to understand the reason of SIGSEGV?

I also could not reproduce the segfault. I don't know why I observed it,
but it may be because I missed something when installing. Sorry for
annoying you.

Instead, I observed "ERROR:  cache lookup failed for attribute 6 of
relation " in v8 patch, but this was fixed in v9 patch.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Add client connection check during the execution of the query

2021-03-21 Thread Thomas Munro
On Sat, Mar 6, 2021 at 5:50 PM Zhihong Yu  wrote:
> +   if (client_connection_check_interval > 0)
> +   enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
>
> +   /* Start timeout for checking if the client has gone away if necessary. */
> +   if (client_connection_check_interval)

Thanks!  Fixed.

> +Sets a time interval, in milliseconds, between periodic
>
> I wonder if the interval should be expressed in seconds. Since the 
> description says: while running very long queries.

Hmm.  Personally I think we should stop emphasising default units
(it's an internal detail) and encourage people to specify the units.
But that was indeed not good wording, so I fixed it.  I've now
rewritten the documentation completely, and moved it to the connection
section near the related TCP keepalive settings.  I tried to be really
explicit about what this feature really does, and in which cases it
helps, and what behaviour you can expect without this feature
configured.

Other changes:

1.  Fixed inconsistencies in .conf.sample and GUC descriptions from by
Ishii-san's review.
2.  Added comments to pg_check_client_connection() to explain the
conditional logic therein, justifying each conditional branch and the
interactions between this logic and the PqCommReadingMsg and
ClientConnectionLost variables.
3.  I added an ereport(COMMERROR) message for unexpected errnos in
this path, since otherwise an errno would be discarded making
diagnosis impossible.
4.  Avoided calling enable_timeout_after() multiple times, like we do
for statement timeouts.
5.  cfbot told me "Could not determine contrib module type for
connection" on Windows.  I do not understand this stuff, so I just
added the new test module "connection" to @contrib_excludes, like
everyone else apparently does.
6.  pgindented.

That's enough for today, but here are some things I'm still wondering about:

1.  Might need to rethink the name of the GUC.  By including "client"
in it, it sounds a bit like it affects behaviour of the client, rather
than the server.  Also the internal variable
CheckClientConnectionPending looks funny next to ClientConnectionLost
(could be ClientConnectionCheckPending?).
2.  The tests need tightening up.  The thing with the "sleep 3" will
not survive contact with the build farm, and I'm not sure if the SSL
test is as short as it could be.
3.  Needs testing on Windows.

I've now hacked this code around so much that I've added myself as
co-author in the commit message.
From 9a8af3fe03d9d7673f163e6087dd724acc40161d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 1 Mar 2021 18:08:23 +1300
Subject: [PATCH v5] Detect dropped connections while running queries.

Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.

Author: Sergey Cherkashin 
Author: Thomas Munro 
Reviewed-by: Thomas Munro 
Reviewed-by: Tatsuo Ishii 
Reviewed-by: Konstantin Knizhnik 
Reviewed-by: Zhihong Yu 
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml  | 36 +++
 src/backend/libpq/pqcomm.c| 91 ++
 src/backend/tcop/postgres.c   |  8 ++
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 11 +++
 src/backend/utils/misc/guc.c  | 10 ++
 src/backend/utils/misc/postgresql.conf.sample |  3 +
 src/include/libpq/libpq.h |  2 +
 src/include/miscadmin.h   |  3 +-
 src/include/utils/timeout.h   |  1 +
 src/test/modules/Makefile |  1 +
 src/test/modules/connection/Makefile  | 16 
 .../connection/t/001_close_connection.pl  | 96 +++
 src/tools/msvc/Mkvcbuild.pm   |  4 +-
 14 files changed, 281 insertions(+), 2 deletions(-)
 create mode 100644 src/test/modules/connection/Makefile
 create mode 100644 src/test/modules/connection/t/001_close_connection.pl

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..fc463f0a65 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -998,6 +998,42 @@ include_dir 'conf.d'
   
  
 
+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+Sets the time interval between checks that the client is still
+connected, while running queries.  The check is performed by testing
+whether one byte could be read from the socket with
+MSG_PEEK.  If the kernel reports that the connection
+has been closed or lost, a long running query can abort immediately,
+rather than discovering the problem when it eventually tries to send
+the response.
+   
+   
+If this value is specified without units, it is taken as 

Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?

2021-03-21 Thread Amit Kapila
On Mon, Mar 22, 2021 at 3:20 AM Peter Smith  wrote:
>
> On Sun, Mar 21, 2021 at 8:54 PM Amit Kapila  wrote:
> >
> > On Sat, Mar 20, 2021 at 12:54 PM Peter Smith  wrote:
> > >
> > > PSA my patch to correct this by firstly doing a HASH_FIND, then only
> > > HASH_REMOVE after we've finished using the ent.
> > >
> >
> > Why can't we keep using HASH_REMOVE as it is but get the output (entry
> > found or not) in the last parameter of hash_search API and then
> > perform Assert based on that? See similar usage in reorderbuffer.c and
> > rewriteheap.c.
> >
>
> Changing the Assert doesn't do anything to fix the problem as
> described, i.e. dereferencing of ent after the HASH_REMOVE.
>
> The real problem isn't the Assert. It's all those other usages of ent
> disobeying the API rule: "(NB: in the case of the REMOVE action, the
> result is a dangling pointer that shouldn't be dereferenced!)"
>

Right, that is a problem. I see that your patch will fix it. Thanks.

-- 
With Regards,
Amit Kapila.




Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

2021-03-21 Thread Fujii Masao



On 2021/03/19 18:27, Fujii Masao wrote:



On 2021/03/19 15:06, shinya11.k...@nttdata.com wrote:

But 0 value maybe looks strange, so in current version I show it like >below:
Type N (%) Record size (%) FPI size (%) Combined size (%)
 - --- --- ---  --- - --- ...
XLOG/SWITCH_JUNK - ( -) 11006248 ( 72.26) - ( -) 11006248 ( 65.78)
Transaction/COMMIT 10 ( 0.03) 340 ( 0.00) 0 ( 0.00) 340 ( 0.00)


I just wanted to know why the "count" and "fpi_len" fields 0 are.
So, it would be nice to have 0 values. Sorry for confusing you.


Kato, it's not clear to me if you were asking for - to be changed back to 0?

You marked the patch as Ready for Committer so I assume not, but it would be
better to say clearly that you think this patch is ready for a committer to 
look at.


Yes, I don't think it needs to be changed back to 0.
I think this patch is ready for a committer to look at.


What's the use case of this feature? I read through this thread briefly,
but I'm still not sure how useful this feature is.

Horiguchi-san reported one issue upthread; --stats=record shows
two lines for Transaction/COMMIT record. Probably this should be
fixed separately.

Horiguchi-san,
Do you have updated version of that bug-fix patch?
Or you started another thread for that issue?


I confirmed that only XACT records need to be processed differently.
So the patch that Horiguchi-san posted upthread looks good and enough
to me. I added a bit more detail information into the comment in the patch.
Attached is the updated version of the patch. Since this issue looks like
a bug, I'm thinking to back-patch that. Thought?

Barring any objection, I will commit this.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 610f65e471..4e90d82c7d 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -438,6 +438,15 @@ XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats 
*stats,
 
recid = XLogRecGetInfo(record) >> 4;
 
+   /*
+* XACT records need to be handled differently. Those records use the
+* first bit of thoes four bits for an optional flag variable and the
+* following three bits for the opcode. We filter opcode out of xl_info
+* and use it as the identifier of the record.
+*/
+   if (rmid == RM_XACT_ID)
+   recid &= 0x07;
+
stats->record_stats[rmid][recid].count++;
stats->record_stats[rmid][recid].rec_len += rec_len;
stats->record_stats[rmid][recid].fpi_len += fpi_len;


Re: row filtering for logical replication

2021-03-21 Thread Euler Taveira
On Thu, Mar 18, 2021, at 7:51 AM, Rahila Syed wrote:
> 1. 
> I think the docs are being incorrectly updated to add a column to 
> pg_partitioned_table
> instead of pg_publication_rel.
Good catch.

> 2.   +typedef struct PublicationRelationQual
>  +{
> +   Oid relid;
> +   Relationrelation;
> +   Node   *whereClause;
> +} PublicationRelationQual;
> 
> Can this be given a more generic name like PublicationRelationInfo, so that 
> the same struct 
> can be used to store additional relation information in future, for ex. 
> column names, if column filtering is introduced.
Good idea. I rename it and it'll be in this next patch set.

> 3. Also, in the above structure, it seems that we can do with storing just 
> relid and derive relation information from it
> using table_open when needed. Am I missing something?
We need the Relation. See OpenTableList(). The way this code is organized, it
opens all publication tables and append each Relation to a list. This list is
used in PublicationAddTables() to update the catalog. I tried to minimize the
number of refactors while introducing this feature. We could probably revise
this code in the future (someone said in a previous discussion that it is weird
to open relations in one source code file -- publicationcmds.c -- and use it
into another one -- pg_publication.c).

> 4.  Currently in logical replication, I noticed that an UPDATE is being 
> applied on the subscriber even if the column values
>  are unchanged. Can row-filtering feature be used to change it such that, 
> when all the OLD.columns = NEW.columns, filter out 
> the row from being sent to the subscriber. I understand this would need 
> REPLICA IDENTITY FULL to work, but would be an
> improvement from the existing state.
This is how Postgres works.

postgres=# create table foo (a integer, b integer);
CREATE TABLE
postgres=# insert into foo values(1, 100);
INSERT 0 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid  |  xmin  | xmax | a |  b 
---++--+---+-
(0,1) | 488920 |0 | 1 | 100
(1 row)

postgres=# update foo set b = 101 where a = 1;
UPDATE 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid  |  xmin  | xmax | a |  b 
---++--+---+-
(0,2) | 488921 |0 | 1 | 101
(1 row)

postgres=# update foo set b = 101 where a = 1;
UPDATE 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid  |  xmin  | xmax | a |  b 
---++--+---+-
(0,3) | 488922 |0 | 1 | 101
(1 row)

You could probably abuse this feature and skip some UPDATEs when old tuple is
identical to new tuple. The question is: why would someone issue the same
command multiple times? A broken application? I would say: don't do it. Besides
that, this feature could impose an overhead into a code path that already
consume substantial CPU time. I've seen some tables with RIF and dozens of
columns that would certainly contribute to increase the replication lag.

> 5. Currently, any existing rows that were not replicated, when updated to 
> match the publication quals
> using UPDATE tab SET pub_qual_column = 'not_filtered' where a = 1; won't be 
> applied, as row 
> does not exist on the subscriber.  It would be good if ALTER SUBSCRIBER 
> REFRESH PUBLICATION
> would help fetch such existing rows from publishers that match the qual 
> now(either because the row changed
> or the qual changed)
I see. This should be addressed by a resynchronize feature. Such option is
useful when you have to change the row filter. It should certainly be implement
as an ALTER SUBSCRIPTION subcommand.

I attached a new patch set that addresses:

* fix documentation;
* rename PublicationRelationQual to PublicationRelationInfo;
* remove the memset that was leftover from a previous patch set;
* add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise the row
  filter code).


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From a6d893be0091bc8cd8569ac380e6f628263d31c0 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 18 Jan 2021 11:53:34 -0300
Subject: [PATCH v12 1/5] Rename a WHERE node

A WHERE clause will be used for row filtering in logical replication. We
already have a similar node: 'WHERE (condition here)'. Let's rename the
node to a generic name and use it for row filtering too.
---
 src/backend/parser/gram.y | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index bc43641ffe..22bbb27041 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -495,7 +495,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	def_arg columnElem where_clause where_or_current_clause
 a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
 columnref in_expr having_clause func_table xmltable array_expr
-ExclusionWhereClause operator_def_arg
+OptWhereClause operator_def_arg
 %type 	rowsfrom_item 

Re: row filtering for logical replication

2021-03-21 Thread Euler Taveira
On Tue, Mar 9, 2021, at 12:05 PM, Rahila Syed wrote:
> Please find some comments below:
Thanks for your review.

> 1. If the where clause contains non-replica identity columns, the delete 
> performed on a replicated row
>  using DELETE from pub_tab where repl_ident_col = n;
> is not being replicated, as logical replication does not have any info 
> whether the column has 
> to be filtered or not. 
> Shouldn't a warning be thrown in this case to notify the user that the delete 
> is not replicated.
Isn't documentation enough? If you add a WARNING, it should be printed per row,
hence, a huge DELETE will flood the client with WARNING messages by default. If
you are thinking about LOG messages, it is a different story. However, we
should limit those messages to one per transaction. Even if we add such an aid,
it would impose a performance penalty while checking the DELETE is not
replicating because the row filter contains a column that is not part of the PK
or REPLICA IDENTITY. If I were to add any message, it would be to warn at the
creation time (CREATE PUBLICATION or ALTER PUBLICATION ... [ADD|SET] TABLE).

> 2. Same for update, even if I update a row to match the quals on publisher, 
> it is still not being replicated to 
> the subscriber. (if the quals contain non-replica identity columns). I think 
> for UPDATE at least, the new value
> of the non-replicate identity column is available which can be used to filter 
> and replicate the update.
Indeed, the row filter for UPDATE uses the new tuple. Maybe your non-replica
identity column contains NULL that evaluates the expression to false.

> 3. 0001.patch, 
> Why is the name of the existing ExclusionWhereClause node being changed, if 
> the exact same definition is being used?
Because this node ExclusionWhereClause is used for exclusion constraint. This
patch renames the node to made it clear it is a generic node that could be used
for other filtering features in the future.

> For 0002.patch,
> 4.   +
>  +   memset(lrel, 0, sizeof(LogicalRepRelation));
> 
> Is this needed, apart from the above, patch does not use or update lrel at 
> all in that function.
Good catch. It is a leftover from a previous patch. It will be fixed in the
next patch set.

> 5.  PublicationRelationQual and PublicationTable have similar fields, can 
> PublicationTable
> be used in place of PublicationRelationQual instead of defining a new struct?
I don't think it is a good idea to have additional fields in a parse node. The
DDL commands use Relation (PublicationTableQual) and parse code uses RangeVar
(PublicationTable). publicationcmds.c uses Relation everywhere so I decided to
create a new struct to store Relation and qual as a list item. It also 
minimizes the places
you have to modify.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Why logical replication lancher exits 1?

2021-03-21 Thread Thomas Munro
On Mon, Mar 22, 2021 at 1:11 PM Tatsuo Ishii  wrote:
> It seems only logical replication launcher exited with exit code 1
> when it received shutdown request. Why?

FWIW here's an earlier discussion of that topic:

https://www.postgresql.org/message-id/flat/CAEepm%3D1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA%40mail.gmail.com




PostgreSQL 14 Feature Freeze + Release Management Team (RMT)

2021-03-21 Thread Michael Paquier
Hi,

The Release Management Team (RMT) for the PostgreSQL 14 is assembled
and has determined that the feature freeze date for the PostgreSQL 11
release will be April 7, 2021.  This means that any feature for the
PostgreSQL 14 release *must be committed by April 7, 2021 AoE*
("anywhere on earth", see [1]).  In other words, by April 8, it is too
late.

This naturally extends the March 2021 Commit Fest to April 7, 2021.
After the freeze is in effect, any open feature in the current Commit
Fest will be moved into the future one.

Open items for the PostgreSQL 14 release will be tracked here:
https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

For the PostgreSQL 14 release, the release management team is composed
of:
Peter Geoghegan 
Andrew Dunstan 
Michael Paquier 

For the time being, if you have any questions about the process,
please feel free to email any member of the RMT.  We will send out
notes with updates and additional guidance in the near future.

[1] https://en.wikipedia.org/wiki/Anywhere_on_Earth

On behalf of the RMT, thanks,
--
Michael


signature.asc
Description: PGP signature


Re: shared-memory based stats collector

2021-03-21 Thread Kyotaro Horiguchi
At Thu, 18 Mar 2021 01:47:20 -0700, Andres Freund  wrote in 
> Hi,
> 
> On 2021-03-18 16:56:02 +0900, Kyotaro Horiguchi wrote:
> > At Tue, 16 Mar 2021 10:27:55 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > At Mon, 15 Mar 2021 17:49:36 +0900 (JST), Kyotaro Horiguchi 
> > >  wrote in 
> > > > Thanks for committing this!  I'm very happy to see this reduces the
> > > > size of this patchset.
> > > 
> > > Now that 0003 is committed as d75288fb27, and 33394ee6f2 conflicts
> > > with old 0004, I'd like to post a rebased version for future work.
> > > 
> > > The commit 33394ee6f2 adds on-exit forced write of WAL stats on
> > > walwriter and in this patch that part would appear to have been
> > > removed.  However, this patchset already does that by calling to
> > > pgstat_report_stat from pgstat_beshutdown_hook.
> > 
> > Rebased and fixed two bugs.  Not addressed received comments in this
> > version.
> 
> Since I am heavily editing the code, could you submit "functional"
> changes (as opposed to fixing rebase issues) as incremental patches?

Oh.. please wait for.. a moment, maybe.

regareds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-03-21 Thread ikedamsh
On 2021-03-19 16:30, Fujii Masao wrote:
> On 2021/03/15 10:39, Masahiro Ikeda wrote:
>> Thanks, I understood get_sync_bit() checks the sync flags and
>> the write unit of generated wal data and replicated wal data is 
>> different.
>> (It's interesting optimization whether to use kernel cache or not.)
>> 
>> OK. Although I agree to separate the stats for the walrecever,
>> I want to hear opinions from other people too. I didn't change the 
>> patch.
>> 
>> Please feel to your comments.
> 
> What about applying the patch for common WAL write function like
> XLogWriteFile(), separately from the patch for walreceiver's stats?
> Seems the former reaches the consensus, so we can commit it firstly.
> Also even only the former change is useful because which allows
> walreceiver to report WALWrite wait event.

Agreed. I separated the patches.

If only the former is committed, my trivial concern is that there may be
a disadvantage, but no advantage for the standby server. It may lead to
performance degradation to the wal receiver by calling
INSTR_TIME_SET_CURRENT(), but the stats can't visible for users until the
latter patch is committed.

I think it's ok because this not happening in the case to disable the
"track_wal_io_timing" in the standby server. Although some users may start the
standby server using the backup which "track_wal_io_timing" is enabled in the
primary server, they will say it's ok since the users already accept the
performance degradation in the primary server.

>> OK. I agree.
>> 
>> I wonder to change the error check ways depending on who calls this 
>> function?
>> Now, only the walreceiver checks (1)errno==0 and doesn't check 
>> (2)errno==ENITR.
>> Other processes are the opposite.
>> 
>> IIUC, it's appropriate that every process checks (1)(2).
>> Please let me know my understanding is wrong.
> 
> I'm thinking the same. Regarding (2), commit 79ce29c734 introduced
> that code. According to the following commit log, it seems harmless
> to retry on EINTR even walreceiver.
> 
> Also retry on EINTR. All signals used in the backend are flagged 
> SA_RESTART
> nowadays, so it shouldn't happen, but better to be defensive.

Thanks, I understood.


>>> BTW, currently XLogWrite() increments IO timing even when pg_pwrite()
>>> reports an error. But this is useless. Probably IO timing should be
>>> incremented after the return code of pg_pwrite() is checked, instead?
>> 
>> Yes, I agree. I fixed it.
>> (v18-0003-Makes-the-wal-receiver-report-WAL-statistics.patch)
> 
> Thanks for the patch!
> 
>   nleft = nbytes;
>   do
>   {
> - errno = 0;
> + written = XLogWriteFile(openLogFile, from, 
> nleft, (off_t) 
> startoffset,
> + 
> ThisTimeLineID, openLogSegNo, wal_segment_size);
> 
> Can we merge this do-while loop in XLogWrite() into the loop
> in XLogWriteFile()?
> If we do that, ISTM that the following codes are not necessary in 
> XLogWrite().
> 
>   nleft -= written;
>   from += written;

OK, I fixed it.


> + * 'segsize' is a segment size of WAL segment file.
> 
> Since segsize is always wal_segment_size, segsize argument seems
> not necessary in XLogWriteFile().

Right. I fixed it.


> +XLogWriteFile(int fd, const void *buf, size_t nbyte, off_t offset,
> +   TimeLineID timelineid, XLogSegNo segno, int segsize)
> 
> Why did you use "const void *" instead of "char *" for *buf?

I followed the argument of pg_pwrite().
But, I think "char *" is better, so fixed it.


> Regarding 0005 patch, I will review it later.

Thanks.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index a7a94d2a83..df028c5039 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -771,6 +771,9 @@ WalRcvDie(int code, Datum arg)
 	/* Ensure that all WAL records received are flushed to disk */
 	XLogWalRcvFlush(true);
 
+	/* Send WAL statistics to the stats collector before terminating */
+	pgstat_send_wal(true);
+
 	/* Mark ourselves inactive in shared memory */
 	SpinLockAcquire(>mutex);
 	Assert(walrcv->walRcvState == WALRCV_STREAMING ||
@@ -910,6 +913,12 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 	XLogArchiveForceDone(xlogfname);
 else
 	XLogArchiveNotify(xlogfname);
+
+/*
+ * Send WAL statistics to the stats collector when finishing
+ * the current WAL segment file to avoid overloading it.
+ */
+pgstat_send_wal(false);
 			}
 			recvFile = -1;
 



diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bd5e787e55..4c7d90f1b9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2536,61 +2536,14 @@ 

Re: Using COPY FREEZE in pgbench

2021-03-21 Thread Tatsuo Ishii
> I did a few tests on my laptop. Is seems that copying takes a little
> more time, say about 10%, but vacuum is indeed very significantly
> reduced, so that the total time for copying and vacuuming is reduced
> by 10% on overall.
> 
> So it is okay for me.

Thanks for the test.

I wrote:
> Curent master:
> pgbench -i -s 100
> :
> :
> done in 70.78 s (drop tables 0.21 s, create tables 0.02 s, client-side 
> generate 12.42 s, vacuum 51.11 s, primary keys 7.02 s).
> 
> Using FREEZE:
> done in 16.86 s (drop tables 0.20 s, create tables 0.01 s, client-side 
> generate 11.86 s, vacuum 0.25 s, primary keys 4.53 s).
> 
> As you can see total time drops from 70.78 seconds to 16.86 seconds,
> that is 4.1 times faster. This is mainly because vacuum takes only
> 0.25 seconds after COPY FREEZE while unpatched pgbench takes 51.11
> seconds, which is 204 times slower.

I did same test again.

13.2 pgbench + master branch server:
done in 15.47 s (drop tables 0.19 s, create tables 0.01 s, client-side generate 
9.07 s, vacuum 2.07 s, primary keys 4.13 s).

With patch on master branch:
done in 13.38 s (drop tables 0.19 s, create tables 0.01 s, client-side generate 
9.68 s, vacuum 0.23 s, primary keys 3.27 s).

This time current pgbench performs much faster than I wrote (15.47 s
vs. 70.78 s). I don't why.

Anyway, this time total pgbench time is reduced by 14% over all
here. I hope people agree that the patch is worth the gain.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Why logical replication lancher exits 1?

2021-03-21 Thread Tatsuo Ishii
I noticed when I execute "pg_ctl stop".

11799 2021-03-22 07:28:19 JST LOG:  received fast shutdown request
11799 2021-03-22 07:28:19 JST LOG:  aborting any active transactions
11799 2021-03-22 07:28:20 JST LOG:  background worker "logical replication 
launcher" (PID 11807) exited with exit code 1
11802 2021-03-22 07:28:20 JST LOG:  shutting down
11799 2021-03-22 07:28:20 JST LOG:  database system is shut down

It seems only logical replication launcher exited with exit code 1
when it received shutdown request. Why?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Log message for GSS connection is missing once connection authorization is successful.

2021-03-21 Thread Michael Paquier
On Sun, Mar 21, 2021 at 05:53:04PM +0530, Bharath Rupireddy wrote:
> +1 from me.  So, after every call to test_access, the node's current
> logfile gets truncated and we don't need the logging collector process
> to step in for rotation of the logfile.
> 
> The patch looks good to me and the kerberos regression tests pass with it.

Thanks Stephen and Bharath for looking at it!  I have applied that
now.
--
Michael


signature.asc
Description: PGP signature


Re: PITR promote bug: Checkpointer writes to older timeline

2021-03-21 Thread Michael Paquier
On Thu, Mar 18, 2021 at 12:56:12PM +0900, Michael Paquier wrote:
> I was looking at uses of ThisTimeLineID in the wild, and could not
> find it getting checked or used actually in backend-side code that
> involved the WAL reader facility.  Even if it brings confidence, it
> does not mean that it is not used somewhere :/

I have been working on that over the last couple of days, and applied
a fix down to 10.  One thing that I did not like in the test was the
use of compare() to check if the contents of the WAL segment before
and after the timeline jump remained the same as this would have been
unstable with any concurrent activity.  Instead, I have added a phase
at the end of the test with an extra checkpoint and recovery triggered
once, which is enough to reproduce the PANIC reported at the top of
the thread.

I'll look into clarifying the use of ThisTimeLineID within the those
WAL reader callbacks, because this is really bug-prone in the long
term...  This requires some coordination with the recent work aimed at
adding some logical decoding support in standbys, though.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Custom compression methods

2021-03-21 Thread Justin Pryzby
On Sat, Mar 20, 2021 at 06:20:39PM -0500, Justin Pryzby wrote:
> Rebased on HEAD.
> 0005 forgot to update compression_1.out.
> Included changes to ./configure.ac and some other patches, but not Tomas's,
> since it'll make CFBOT get mad as soon as that's pushed.

Rebased again.
Renamed "t" to a badcompresstbl to avoid name conflicts.
Polish the enum GUC patch some.

I noticed that TOAST_INVALID_COMPRESSION_ID was unused ... but then I found a
use for it.
>From edc0e6ab248600c6d33dd681359554550c88a679 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 19:12:53 -0500
Subject: [PATCH v3 01/12] Add docs for default_toast_compression..

bbe0a81db69bd10bd166907c3701492a29aca294
---
 doc/src/sgml/config.sgml | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..5cb851a5eb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8151,6 +8151,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  default_toast_compression (string)
+  
+   default_toast_compression configuration parameter
+  
+  
+  
+   
+This parameter specifies the default compression method to use
+when compressing data in TOAST tables.
+It applies only to variable-width data types.
+It may be overriden by compression clauses in the
+CREATE command, or changed after the relation is
+created by ALTER TABLE ... SET COMPRESSION.
+
+The supported compression methods are pglz and
+(if configured at the time PostgreSQL was
+built) lz4.
+The default is pglz.
+   
+  
+ 
+
  
   temp_tablespaces (string)
   
-- 
2.17.0

>From 02a9038381bdfe577aa32b67bcd231acd61f2c20 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 21:52:28 -0500
Subject: [PATCH v3 02/12] doc: pg_dump --no-toast-compression

---
 doc/src/sgml/ref/pg_dump.sgml | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index bcbb7a25fb..989b8e2381 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -931,6 +931,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-toast-compression
+  
+   
+Do not output commands to set TOAST compression
+methods.
+With this option, all objects will be created using whichever
+compression method is the default during restore.
+   
+  
+ 
+
  
   --no-unlogged-table-data
   
-- 
2.17.0

>From 7e9b4b3c6f12abaf7531ef5109e084dc188e7dda Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:23:40 -0500
Subject: [PATCH v3 03/12] Compression method is an char not an OID

---
 src/backend/commands/tablecmds.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 877c08814e..22f3c5efc0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7847,6 +7847,7 @@ SetIndexStorageProperties(Relation rel, Relation attrelation,
 		index_close(indrel, lockmode);
 	}
 }
+
 /*
  * ALTER TABLE ALTER COLUMN SET STORAGE
  *
@@ -15070,7 +15071,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	AttrNumber	attnum;
 	char	   *compression;
 	char		typstorage;
-	Oid			cmoid;
+	char		cmethod;
 	ObjectAddress address;
 
 	Assert(IsA(newValue, String));
@@ -15104,10 +15105,10 @@ ATExecSetCompression(AlteredTableInfo *tab,
 		format_type_be(atttableform->atttypid;
 
 	/* get the attribute compression method. */
-	cmoid = GetAttributeCompression(atttableform, compression);
+	cmethod = GetAttributeCompression(atttableform, compression);
 
 	/* update pg_attribute entry */
-	atttableform->attcompression = cmoid;
+	atttableform->attcompression = cmethod;
 	CatalogTupleUpdate(attrel, >t_self, tuple);
 
 	InvokeObjectPostAlterHook(RelationRelationId,
@@ -15118,7 +15119,7 @@ ATExecSetCompression(AlteredTableInfo *tab,
 	 * Apply the change to indexes as well (only for simple index columns,
 	 * matching behavior of index.c ConstructTupleDescriptor()).
 	 */
-	SetIndexStorageProperties(rel, attrel, attnum, cmoid, '\0', lockmode);
+	SetIndexStorageProperties(rel, attrel, attnum, cmethod, '\0', lockmode);
 
 	heap_freetuple(tuple);
 
-- 
2.17.0

>From 2262c8ffb041e8a7452bda724c676b6118895f4b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 19 Mar 2021 20:07:31 -0500
Subject: [PATCH v3 04/12] Remove duplicative macro

---
 src/include/access/toast_compression.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/include/access/toast_compression.h b/src/include/access/toast_compression.h
index 5c9220c171..44b73bd57c 100644
--- a/src/include/access/toast_compression.h
+++ b/src/include/access/toast_compression.h
@@ -50,8 +50,6 @@ typedef enum 

Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 21, 2021 at 07:11:50PM -0400, Tom Lane wrote:
>> I hate to be the bearer of bad news, but this suggests that
>> LZ4_decompress_safe_partial is seriously broken in 1.9.2
>> as well:
>> https://github.com/lz4/lz4/issues/783

> Ouch

Actually, after reading that closer, the problem only affects the
case where the compressed-data-length passed to the function is
a lie.  So it shouldn't be a problem for our usage.

Also, after studying the documentation for LZ4_decompress_safe
and LZ4_decompress_safe_partial, I realized that liblz4 is also
counting on the *output* buffer size to not be a lie.  So we
cannot pass it a number larger than the chunk's true decompressed
size.  The attached patch resolves the issue I'm seeing.

regards, tom lane
diff --git a/src/backend/access/common/toast_compression.c b/src/backend/access/common/toast_compression.c
index 00af1740cf..74e449992a 100644
--- a/src/backend/access/common/toast_compression.c
+++ b/src/backend/access/common/toast_compression.c
@@ -220,6 +220,10 @@ lz4_decompress_datum_slice(const struct varlena *value, int32 slicelength)
 	if (LZ4_versionNumber() < 10803)
 		return lz4_decompress_datum(value);
 
+	/* liblz4 assumes that slicelength is not an overestimate */
+	if (slicelength >= VARRAWSIZE_4B_C(value))
+		return lz4_decompress_datum(value);
+
 	/* allocate memory for the uncompressed data */
 	result = (struct varlena *) palloc(slicelength + VARHDRSZ);
 


Re: Support for NSS as a libpq TLS backend

2021-03-21 Thread Stephen Frost
Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> Attached is a rebase which attempts to fix the cfbot Appveyor failure, there
> were missing HAVE_ defines for MSVC.

> Subject: [PATCH v30 1/9] nss: Support libnss as TLS library in libpq
> 
> This commit contains the frontend and backend portion of TLS support
> in libpq to allow encrypted connections. The implementation is done

maybe add 'using NSS' to that first sentence. ;)

> +++ b/src/backend/libpq/auth.c
> @@ -2849,7 +2849,14 @@ CheckCertAuth(Port *port)
>  {
>   int status_check_usermap = STATUS_ERROR;
>  
> +#if defined(USE_OPENSSL)
>   Assert(port->ssl);
> +#elif defined(USE_NSS)
> + /* TODO: should we rename pr_fd to ssl, to keep consistency? */
> + Assert(port->pr_fd);
> +#else
> + Assert(false);
> +#endif

Having thought about this TODO item for a bit, I tend to think it's
better to keep them distinct.  They aren't the same and it might not be
clear what's going on if one was to somehow mix them (at least if pr_fd
continues to sometimes be a void*, but I wonder why that's being
done..?  more on that later..).

> +++ b/src/backend/libpq/be-secure-nss.c
[...]
> +/* default init hook can be overridden by a shared library */
> +static void default_nss_tls_init(bool isServerStart);
> +nss_tls_init_hook_type nss_tls_init_hook = default_nss_tls_init;

> +static PRDescIdentity pr_id;
> +
> +static PRIOMethods pr_iomethods;

Happy to be told I'm missing something, but the above two variables seem
to only be used in init_iolayer.. is there a reason they're declared
here instead of just being declared in that function?

> + /*
> +  * Set the fallback versions for the TLS protocol version range to a
> +  * combination of our minimal requirement and the library maximum. Error
> +  * messages should be kept identical to those in be-secure-openssl.c to
> +  * make translations easier.
> +  */

Should we pull these error messages out into another header so that
they're in one place to make sure they're kept consistent, if we really
want to put the effort in to keep them the same..?  I'm not 100% sure
that it's actually necessary to do so, but defining these in one place
would help maintain this if we want to.  Also alright with just keeping
the comment, not that big of a deal.

> +int
> +be_tls_open_server(Port *port)
> +{
> + SECStatus   status;
> + PRFileDesc *model;
> + PRFileDesc *pr_fd;

pr_fd here is materially different from port->pr_fd, no?  As in, one is
the NSS raw TCP fd while the other is the SSL fd, right?  Maybe we
should use two different variable names to try and make sure they don't
get confused?  Might even set this to NULL after we are done with it
too..  Then again, I see later on that when we do the dance with the
'model' PRFileDesc that we just use the same variable- maybe we should
do that?  That is, just get rid of this 'pr_fd' and use port->pr_fd
always?

> + /*
> +  * The NSPR documentation states that runtime initialization via PR_Init
> +  * is no longer required, as the first caller into NSPR will perform the
> +  * initialization implicitly. The documentation doesn't however clarify
> +  * from which version this is holds true, so let's perform the 
> potentially
> +  * superfluous initialization anyways to avoid crashing on older 
> versions
> +  * of NSPR, as there is no difference in overhead.  The NSS 
> documentation
> +  * still states that PR_Init must be called in some way (implicitly or
> +  * explicitly).
> +  *
> +  * The below parameters are what the implicit initialization would've 
> done
> +  * for us, and should work even for older versions where it might not be
> +  * done automatically. The last parameter, maxPTDs, is set to various
> +  * values in other codebases, but has been unused since NSPR 2.1 which 
> was
> +  * released sometime in 1998. In current versions of NSPR all parameters
> +  * are ignored.
> +  */
> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );
> +
> + /*
> +  * The certificate path (configdir) must contain a valid NSS database. 
> If
> +  * the certificate path isn't a valid directory, NSS will fall back on 
> the
> +  * system certificate database. If the certificate path is a directory 
> but
> +  * is empty then the initialization will fail. On the client side this 
> can
> +  * be allowed for any sslmode but the verify-xxx ones.
> +  * https://bugzilla.redhat.com/show_bug.cgi?id=728562 For the server 
> side
> +  * we won't allow this to fail however, as we require the certificate 
> and
> +  * key to exist.
> +  *
> +  * The original design of NSS was for a single application to use a 
> single
> +  * copy of it, initialized with NSS_Initialize() which isn't returning 
> any
> +  * handle with which to refer to NSS. NSS initialization and shutdown 
> are
> +  

Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 07:11:50PM -0400, Tom Lane wrote:
> I wrote:
> > Justin Pryzby  writes:
> >> On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
> >>> This seems somewhat repeatable (three identical failures in three
> >>> attempts).  Not sure why I did not see it yesterday; but anyway,
> >>> there is something wrong with partial detoasting for LZ4.
> 
> >> With what version of LZ4 ?
> 
> > RHEL8's, which is
> > lz4-1.8.3-2.el8.x86_64
> 
> I hate to be the bearer of bad news, but this suggests that
> LZ4_decompress_safe_partial is seriously broken in 1.9.2
> as well:
> 
> https://github.com/lz4/lz4/issues/783

Ouch

> Maybe we cannot rely on that function for a few more years yet.
> 
> Also, I don't really understand why this code:
> 
>   /* slice decompression not supported prior to 1.8.3 */
>   if (LZ4_versionNumber() < 10803)
>   return lz4_decompress_datum(value);
> 
> It seems likely to me that we'd get a flat out build failure
> from library versions lacking LZ4_decompress_safe_partial,
> and thus that this run-time test is dead code and we should
> better be using a configure probe if we intend to allow old
> liblz4 versions.  Though that might be moot.

The function existed before 1.8.3, but didn't handle slicing.
https://github.com/lz4/lz4/releases/tag/v1.8.3
|Finally, an existing function, LZ4_decompress_safe_partial(), has been 
enhanced to make it possible to decompress only the beginning of an LZ4 block, 
up to a specified number of bytes. Partial decoding can be useful to save CPU 
time and memory, when the objective is to extract a limited portion from a 
larger block.

Possibly we could allow v >= 1.9.3 || (ver >= 1.8.3 && ver < 1.9.2).

Or maybe not: the second half apparently worked "by accident", and we shouldn't
need to have intimate knowledge of someone else's patchlevel releases, 

-- 
Justin




Re: Table AM and DDLs

2021-03-21 Thread Andres Freund
Hi,

On 2021-03-03 22:15:18 +0100, Mats Kindahl wrote:
> On Tue, Feb 23, 2021 at 2:11 AM Andres Freund  wrote:
> Thanks for the answer and sorry about the late reply.

Mine is even later ;)


> > I don't think that's quite right. It's not exactly obvious from the
> > name, but RelationDropStorage() does not actually drop storage. Instead
> > it *schedules* the storage to be dropped upon commit.
> >
> > The reason for deferring the dropping of table storage is that DDL in
> > postgres is transactional. Therefore we cannot remove the storage at the
> > moment the DROP TABLE is executed - only when the transaction that
> > performed the DDL commits. Therefore just providing you with a callback
> > that runs in heap_drop_with_catalog() doesn't really achieve much -
> > you'd not have a way to execute the "actual" dropping of the relation at
> > the later stage.
> >
> 
> Yeah, I found the chain (performDeletion -> deleteOneObject -> doDeletion
> -> heap_drop_with_catalog) where the delete was just scheduled for deletion
> but it appeared like this was the place to actually perform the "actual"
> delete. Looking closer, I see this was the wrong location. However, the
> intention was to get a callback when the "actual" delete should happen.
> Before that, the blocks are still potentially alive and could be read, so
> shouldn't be recycled.
> 
> It seems the right location seems to be in the storage manager (smgr_unlink
> in smgr.c), but that does not seem to be extensible, or are there any plans
> to make it available so that you can implement something other than just
> "magnetic disk"?

There've been patches to add new types of storage below smgr.c, but not
in way that can be done outside of build time. As far as I recall.


> > Before I explain some more: Could you describe in a bit more detail what
> > kind of optimization you'd like to make?
> >
> 
> This is not really about any optimizations, it more about a good API for
> tables and managing storage. If a memory table can be implemented entirely
> in the extension and storage managed fully, there is a lot of interesting
> potential for various implementations of table backends. For this to work I
> think it is necessary to be able to handle schema changes for the backend
> storage in addition to scans, inserts, updates, and deletes, but I am not
> sure if it is already possible in some way that I haven't discovered or if
> I should just try to propose something (making the storage manager API
> extensible seems like a good first attempt).

As long as you have a compatible definition of what is acceptable "in
place" ALTER TABLE (e.g. adding new columns, changing between compatible
types), and what requires a table rewrite (e.g. an incompatible column
type change), I don't see a real problem. Except for the unlink thing
above.

Any schema change requiring a table rewrite will trigger a new relation
to be created, which in turn will involve tableam. After that you'll
just get called back to re-insert all the tuples in the original
relation.

If you want a different definition on what needs a rewrite, good luck,
it'll be a heck of a lot more work.



> > Due to postgres' transactional DDL you cannot really change the storage
> > layout of *existing data* when that DDL command is executed - the data
> > still needs to be interpretable in case the DDL is rolled back
> > (including when crashing).
> >
> 
> No, didn't expect this, but some means to see that a schema change is about
> to happen.

For anything that's not in-place you'll see a new table being created
(c.f. ATRewriteTables() calling make_new_heap()). The relfilenode
identifying the data (as opposed to the oid, identifying a relation),
will then be swapped with the current table's relfilenode via
finish_heap_swap().


> > Other changes, e.g. changing the type of a column "sufficiently", will
> > cause a so called table rewrite. Which means that a new relation will be
> > created (including a call to relation_set_new_filenode()), then that new
> > relation will get all the new data inserted, and then
> > pg_class->relfilenode for the "original" relation will be changed to the
> > "rewritten" table (there's two variants of this, once for rewrites due
> > to ALTER TABLE and a separate one for VACUUM FULL/CLUSTER).

> But that is not visible in the access method interface. If I add debug
> output to the memory table, I only see a call to needs_toast_table. If
> there were a new call to create a new block and some additional information
> about , this would be possible to handle.

It should be. If I e.g. do

CREATE TABLE blarg(id int4 not null);
I get one call to table_relation_set_new_filenode()

#0  table_relation_set_new_filenode (rel=0x7f84c2417b70, 
newrnode=0x7f84c2417b70, persistence=112 'p', freezeXid=0x7ffc8c61263c, 
minmulti=0x7ffc8c612638)
at /home/andres/src/postgresql/src/include/access/tableam.h:1596
#1  0x55b1901e9116 in heap_create (relname=0x7ffc8c612900 "blarg", 

Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
I wrote:
> Justin Pryzby  writes:
>> On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
>>> This seems somewhat repeatable (three identical failures in three
>>> attempts).  Not sure why I did not see it yesterday; but anyway,
>>> there is something wrong with partial detoasting for LZ4.

>> With what version of LZ4 ?

> RHEL8's, which is
> lz4-1.8.3-2.el8.x86_64

I hate to be the bearer of bad news, but this suggests that
LZ4_decompress_safe_partial is seriously broken in 1.9.2
as well:

https://github.com/lz4/lz4/issues/783

Maybe we cannot rely on that function for a few more years yet.

Also, I don't really understand why this code:

/* slice decompression not supported prior to 1.8.3 */
if (LZ4_versionNumber() < 10803)
return lz4_decompress_datum(value);

It seems likely to me that we'd get a flat out build failure
from library versions lacking LZ4_decompress_safe_partial,
and thus that this run-time test is dead code and we should
better be using a configure probe if we intend to allow old
liblz4 versions.  Though that might be moot.

regards, tom lane




Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?

2021-03-21 Thread Andres Freund
Hi,

On 2021-03-22 11:20:37 +1300, Thomas Munro wrote:
> On Mon, Mar 22, 2021 at 10:50 AM Peter Smith  wrote:
> > The real problem isn't the Assert. It's all those other usages of ent
> > disobeying the API rule: "(NB: in the case of the REMOVE action, the
> > result is a dangling pointer that shouldn't be dereferenced!)"

Right that's clearly not ok.


> I suppose the HASH_REMOVE case could clobber the object with 0x7f if
> CLOBBER_FREED_MEMORY is defined (typically assertion builds)

Yea. Plus VALGRIND_MAKE_MEM_NOACCESS() (requiring a
VALGRIND_MAKE_MEM_UNDEFINED() when reusing) or at least a
VALGRIND_MAKE_MEM_UNDEFINED().


>or alternatively return some other non-NULL but poisoned pointer, so
>that problems of this ilk blow up in early testing.

IMO it's just a bad API to combine the different use cases into
hash_search(). It's kinda defensible to have FIND/ENTER/ENTER_NULL go
through the same function (although I do think it makes code harder to
read), but having HASH_REMOVE is just wrong. The only reason for
returning a dangling pointer is that that's the obvious way to check if
something was found.

If they weren't combined we could tell newer compilers that the memory
shouldn't be accessed after the HASH_REMOVE anymore. And it'd remove
some unnecessary branches in performance critical code...

But I guess making dynahash not terrible from that POV basically means
replacing all of dynahash. Having all those branches for partitioned
hashes, actions are really not great.

Greetings,

Andres Freund




Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?

2021-03-21 Thread Peter Smith
On Mon, Mar 22, 2021 at 9:21 AM Thomas Munro  wrote:
>
> On Mon, Mar 22, 2021 at 10:50 AM Peter Smith  wrote:
> > The real problem isn't the Assert. It's all those other usages of ent
> > disobeying the API rule: "(NB: in the case of the REMOVE action, the
> > result is a dangling pointer that shouldn't be dereferenced!)"
>
> I suppose the HASH_REMOVE case could clobber the object with 0x7f if
> CLOBBER_FREED_MEMORY is defined (typically assertion builds), or
> alternatively return some other non-NULL but poisoned pointer, so that
> problems of this ilk  blow up in early testing.

+1, but not sure if the poisoned ptr alternative can work because some
code (e.g see RemoveTargetIfNoLongerUsed function) is asserting the
return ptr actual value, not just its NULL-ness.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-21 Thread Andres Freund
Hi,

On 2021-03-21 12:14:35 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > 1) What kind of consistency do we want from the pg_stats_* views?
>
> That's a hard choice to make.  But let me set the record straight:
> when we did the initial implementation, the stats snapshotting behavior
> was considered a FEATURE, not an "efficiency hack required by the old
> storage model".

Oh - sorry for misstating that then. I did try to look for the origins of the
approach, and all that I found was that it'd be too expensive to do multiple
stats file reads.


> If I understand what you are proposing, all stats views would become
> completely volatile, without even within-query consistency.  That really
> is not gonna work.  As an example, you could get not-even-self-consistent
> results from a join to a stats view if the planner decides to implement
> it as a nestloop with the view on the inside.

I don't really think it's a problem that's worth incuring that much cost to
prevent. We already have that behaviour for a number of of the pg_stat_* views,
e.g. pg_stat_xact_all_tables, pg_stat_replication.

If the cost were low - or we can find a reasonable way to get to low costs - I
think it'd be worth preserving for backward compatibility's sake alone.  From
an application perspective, I actually rarely want that behaviour for stats
views - I'm querying them to get the most recent information, not an older
snapshot. And in the cases I do want snapshots, I'd want them for longer than a
transaction.

There's just a huge difference between being able to access a table's stats in
O(1) time, or having a single stats access be O(database-objects).

And that includes accesses to things like pg_stat_bgwriter, pg_stat_database
(for IO over time stats etc) that often are monitored at a somewhat high
frequency - they also pay the price of reading in all object stats.  On my
example database with 1M tables it takes 0.4s to read pg_stat_database.


We currently also fetch the full stats in places like autovacuum.c. Where we
don't need repeated access to be consistent - we even explicitly force the
stats to be re-read for every single table that's getting vacuumed.

Even if we to just cache already accessed stats, places like do_autovacuum()
would end up with a completely unnecessary cache of all tables, blowing up
memory usage by a large amount on systems with lots of relations.


> I also believe that the snapshotting behavior has advantages in terms
> of being able to perform multiple successive queries and get consistent
> results from them.  Only the most trivial sorts of analysis don't need
> that.

In most cases you'd not do that in a transaction tho, and you'd need to create
temporary tables with a snapshot of the stats anyway.


> In short, what you are proposing sounds absolutely disastrous for
> usability of the stats views, and I for one will not sign off on it
> being acceptable.

:(

That's why I thought it'd be important to bring this up to a wider
audience. This has been discussed several times in the thread, and nobody
really chimed up wanting the "snapshot" behaviour...


> I do think we could relax the consistency guarantees a little bit,
> perhaps along the lines of only caching view rows that have already
> been read, rather than grabbing everything up front.  But we can't
> just toss the snapshot concept out the window.  It'd be like deciding
> that nobody needs MVCC, or even any sort of repeatable read.

I think that'd still a huge win - caching only what's been accessed rather than
everything will save a lot of memory in very common cases. I did bring it up as
one approach for that reason.

I do think it has a few usability quirks though. The time-skew between stats
objects accessed at different times seems like it could be quite confusing?
E.g. imagine looking at table stats and then later join to index stats and see
table / index stats not matching up at all.


I wonder if a reasonable way out could be to have pg_stat_make_snapshot()
(accompanying the existing pg_stat_clear_snapshot()) that'd do the full eager
data load. But not use any snapshot / caching behaviour without that?

It's be a fair bit of code to have that, but I think can see a way to have it
not be too bad?


Greetings,

Andres Freund




Re: 64-bit XIDs in deleted nbtree pages

2021-03-21 Thread Peter Geoghegan
On Wed, Mar 10, 2021 at 5:34 PM Peter Geoghegan  wrote:
> Here is another bitrot-fix-only revision, v9. Just the recycling patch again.

I committed the final nbtree page deletion patch just now -- the one
that attempts to make recycling happen for newly deleted pages. Thanks
for all your work on patch review, Masahiko!

I'll close out the CF item for this patch series now.

-- 
Peter Geoghegan




Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?

2021-03-21 Thread Thomas Munro
On Mon, Mar 22, 2021 at 10:50 AM Peter Smith  wrote:
> The real problem isn't the Assert. It's all those other usages of ent
> disobeying the API rule: "(NB: in the case of the REMOVE action, the
> result is a dangling pointer that shouldn't be dereferenced!)"

I suppose the HASH_REMOVE case could clobber the object with 0x7f if
CLOBBER_FREED_MEMORY is defined (typically assertion builds), or
alternatively return some other non-NULL but poisoned pointer, so that
problems of this ilk  blow up in early testing.




Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-21 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> If I understand what you are proposing, all stats views would become
> completely volatile, without even within-query consistency.  That really
> is not gonna work.  As an example, you could get not-even-self-consistent
> results from a join to a stats view if the planner decides to implement
> it as a nestloop with the view on the inside.
> 
> I also believe that the snapshotting behavior has advantages in terms
> of being able to perform multiple successive queries and get consistent
> results from them.  Only the most trivial sorts of analysis don't need
> that.
> 
> In short, what you are proposing sounds absolutely disastrous for
> usability of the stats views, and I for one will not sign off on it
> being acceptable.
> 
> I do think we could relax the consistency guarantees a little bit,
> perhaps along the lines of only caching view rows that have already
> been read, rather than grabbing everything up front.  But we can't
> just toss the snapshot concept out the window.  It'd be like deciding
> that nobody needs MVCC, or even any sort of repeatable read.

This isn't the same use-case as traditional tables or relational
concepts in general- there aren't any foreign keys for the fields that
would actually be changing across these accesses to the shared memory
stats- we're talking about gross stats numbers like the number of
inserts into a table, not an employee_id column.  In short, I don't
agree that this is a fair comparison.

Perhaps there's a good argument to try and cache all this info per
backend, but saying that it's because we need MVCC-like semantics for
this data because other things need MVCC isn't it and I don't know that
saying this is a justifiable case for requiring repeatable read is
reasonable either.

What specific, reasonable, analysis of the values that we're actually
talking about, which are already aggregates themselves, is going to
end up being utterly confused?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: recovery_init_sync_method=wal

2021-03-21 Thread Stephen Frost
Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> On Mon, Mar 22, 2021 at 4:31 AM Stephen Frost  wrote:
> > Presuming that we do add to the documentation the language to document
> > what's assumed (and already done by modern backup tools) that they're
> > fsync'ing everything they're restoring, do we/can we have an option
> > which those tools could set that explicitly tells PG "everything in the
> > cluster has been fsync'd already, you don't need to do anything extra"?
> > Perhaps also/seperately one for WAL that's restored with restore command
> > if we think that's necessary?
> 
> In the earlier thread, we did contemplate
> recovery_init_sync_method=none, but it has the problem that after
> recovery completes you have a cluster running with a setting that is
> really bad if you eventually crash again and run crash recovery again,
> this time with a dirty kernel page cache.  I was one of the people
> voting against that feature, but I also wrote a strawman patch just
> for the visceral experience of every cell in my body suddenly
> whispering "yeah, nah, I'm not committing that" as I wrote the
> weaselwordage for the manual.

Why not have a 'recovery_init_sync_method=backup'?  It's not like
there's a question about if we're doing recovery from a backup or not.

> In that thread we also contemplated safe ways for a basebackup-type
> tool to promise that data has been sync'd, to avoid that problem with
> the GUC.  Maybe something like: the backup label file could contain a
> "SYNC_DONE" message.  But then what if someone copies the whole
> directory to a new location, how can you invalidate the promise?  This
> is another version of the question of whether it's our problem or the
> user's to worry about buffering of pgdata files that they copy around
> with unknown tools.  If it's our problem, maybe something like:
> "SYNC_DONE for pgdata_inode=1234, hostname=x, ..." is enough to detect
> cases where we still believe the claim.  But there's probably a long
> tail of weird ways for whatever you come up with to be deceived.

I don't really care for the idea of backup tools modifying the backup
label... they're explicitly told not to do that and that seems like the
best move.  I also don't particularly care about silly "what ifs" like
if someone randomly copies the data directory after the restore- yes,
there's a lot of ways that people can screw things up by doing things
that aren't sane, but that doesn't mean we should try to cater to such
cases.

> In the case of the recovery_init_sync_method=wal patch proposed in
> *this* thread, here's the thing: there's not much to gain by trying to
> skip the sync, anyway!  For the WAL, you'll be opening those files
> soon anyway to replay them, and if they're already sync'd then fsync()
> will return quickly.  For the relfile data, you'll be opening all
> relfiles that are referenced by the WAL soon anyway, and syncing them
> if required.  So that just leaves relfiles that are not referenced in
> the WAL you're about to replay.  Whether we have a duty to sync those
> too is that central question again, and one of the things I'm trying
> to get an answer to with this thread.  The
> recovery_init_sync_method=wal patch only makes sense if the answer is
> "no".

I'm not too bothered by an extra fsync() for WAL files, just to be
clear, it's running around fsync'ing everything else that seems
objectionable to me.

> > Otherwise, just in general, agree with doing this to address the risks
> > discussed around regular crash recovery.  We have some pretty clear "if
> > the DB was doing recovery and was interrupted, you need to restore from
> > backup" messages today in xlog.c, and this patch didn't seem to change
> > that?  Am I missing something or isn't the idea here that these changes
> > would make it so you aren't going to end up with corruption in those
> > cases?  Specifically looking at-
> >
> > xlog.c:6509-
> > case DB_IN_CRASH_RECOVERY:
> > ereport(LOG,
> > (errmsg("database system was interrupted while in 
> > recovery at %s",
> > str_time(ControlFile->time)),
> >  errhint("This probably means that some data is 
> > corrupted and"
> >  " you will have to use the last backup for 
> > recovery.")));
> > break;
> >
> > case DB_IN_ARCHIVE_RECOVERY:
> > ereport(LOG,
> > (errmsg("database system was interrupted while in 
> > recovery at log time %s",
> > str_time(ControlFile->checkPointCopy.time)),
> >  errhint("If this has occurred more than once some data 
> > might be corrupted"
> >  " and you might need to choose an earlier 
> > recovery target.")));
> > break;
> 
> Maybe I missed your point... but I don't think anything changes here?
> If recovery is crashing in some deterministic way (not just because

Re: recovery_init_sync_method=wal

2021-03-21 Thread Thomas Munro
On Mon, Mar 22, 2021 at 4:31 AM Stephen Frost  wrote:
> Presuming that we do add to the documentation the language to document
> what's assumed (and already done by modern backup tools) that they're
> fsync'ing everything they're restoring, do we/can we have an option
> which those tools could set that explicitly tells PG "everything in the
> cluster has been fsync'd already, you don't need to do anything extra"?
> Perhaps also/seperately one for WAL that's restored with restore command
> if we think that's necessary?

In the earlier thread, we did contemplate
recovery_init_sync_method=none, but it has the problem that after
recovery completes you have a cluster running with a setting that is
really bad if you eventually crash again and run crash recovery again,
this time with a dirty kernel page cache.  I was one of the people
voting against that feature, but I also wrote a strawman patch just
for the visceral experience of every cell in my body suddenly
whispering "yeah, nah, I'm not committing that" as I wrote the
weaselwordage for the manual.

In that thread we also contemplated safe ways for a basebackup-type
tool to promise that data has been sync'd, to avoid that problem with
the GUC.  Maybe something like: the backup label file could contain a
"SYNC_DONE" message.  But then what if someone copies the whole
directory to a new location, how can you invalidate the promise?  This
is another version of the question of whether it's our problem or the
user's to worry about buffering of pgdata files that they copy around
with unknown tools.  If it's our problem, maybe something like:
"SYNC_DONE for pgdata_inode=1234, hostname=x, ..." is enough to detect
cases where we still believe the claim.  But there's probably a long
tail of weird ways for whatever you come up with to be deceived.

In the case of the recovery_init_sync_method=wal patch proposed in
*this* thread, here's the thing: there's not much to gain by trying to
skip the sync, anyway!  For the WAL, you'll be opening those files
soon anyway to replay them, and if they're already sync'd then fsync()
will return quickly.  For the relfile data, you'll be opening all
relfiles that are referenced by the WAL soon anyway, and syncing them
if required.  So that just leaves relfiles that are not referenced in
the WAL you're about to replay.  Whether we have a duty to sync those
too is that central question again, and one of the things I'm trying
to get an answer to with this thread.  The
recovery_init_sync_method=wal patch only makes sense if the answer is
"no".

> Otherwise, just in general, agree with doing this to address the risks
> discussed around regular crash recovery.  We have some pretty clear "if
> the DB was doing recovery and was interrupted, you need to restore from
> backup" messages today in xlog.c, and this patch didn't seem to change
> that?  Am I missing something or isn't the idea here that these changes
> would make it so you aren't going to end up with corruption in those
> cases?  Specifically looking at-
>
> xlog.c:6509-
> case DB_IN_CRASH_RECOVERY:
> ereport(LOG,
> (errmsg("database system was interrupted while in 
> recovery at %s",
> str_time(ControlFile->time)),
>  errhint("This probably means that some data is corrupted 
> and"
>  " you will have to use the last backup for 
> recovery.")));
> break;
>
> case DB_IN_ARCHIVE_RECOVERY:
> ereport(LOG,
> (errmsg("database system was interrupted while in 
> recovery at log time %s",
> str_time(ControlFile->checkPointCopy.time)),
>  errhint("If this has occurred more than once some data 
> might be corrupted"
>  " and you might need to choose an earlier 
> recovery target.")));
> break;

Maybe I missed your point... but I don't think anything changes here?
If recovery is crashing in some deterministic way (not just because
you lost power the first time, but rather because a particular log
record hits the same bug or gets confused by the same corruption and
implodes every time) it'll probably keep doing so, and our sync
algorithm doesn't seem to make a difference to that.




Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-21 Thread Andres Freund
Hi,

On 2021-03-21 11:41:30 -0400, Stephen Frost wrote:
> > 1.1)
> >
> > I hope everybody agrees with not requiring that stats don't need to be
> > the way they were at the time of first stat access in a transaction,
> > even if that first access was to a different stat object than the
> > currently accessed stat?
>
> Agreed, that doesn't seem necessary and blowing up backend memory usage
> by copying all the stats into local memory seems pretty terrible.

Yea. I've seen instances where most backends had several hundreds MB of
stats loaded :(. Even leaving the timing overhead aside, that's really
not fun. Of course that application may not have had exactly the
greatest design, but ...


> > 1.2)
> >
> > Do we expect repeated accesses to the same stat to stay the same through
> > the transaction?  The easiest way to implement stats accesses is to
> > simply fetch the stats from shared memory ([3]). That would obviously
> > result in repeated accesses to the same stat potentially returning
> > changing results over time.
> >
> > I think that's perfectly fine, desirable even, for pg_stat_*.
>
> This seems alright to me.

Seems Tom disagrees :(


> > 1.3)
> >
> > What kind of consistency do we expect between columns of views like
> > pg_stat_all_tables?
> > [...]
> > I personally think it's fine to have short-term divergences between the
> > columns. The stats aren't that accurate anyway, as we don't submit them
> > all the time.  And that if we want consistency between columns, we
> > instead should replace the current view definitions with[set of] record
> > returning function - everything else seems to lead to weird tradeoffs.
>
> Agreed, doesn't seem like a huge issue to have short-term divergences
> but if we want to fix them then flipping those all to SRFs would make
> the most sense.

There's also a pretty good efficiency argument for going to SRFs. Doing
18 hashtable lookups + function calls just to return one row of
pg_stat_all_tables surely is a higher overhead than unnecessarily
returning columns that weren't needed by the user.

I do think it makes sense to get idx_scan/idx_tup_fetch via a join
though.


> > 2) How to remove stats of dropped objects?
> >
> > [...]
>
> The current approach sounds pretty terrible and propagating that forward
> doesn't seem great.  Guess here I'd disagree with your gut feeling and
> encourage trying to go 'full in', as you put it, or at least put enough
> effort into it to get a feeling of if it's going to require a *lot* more
> work or not and then reconsider if necessary.

I think my gut's argument is that it's already a huge patch, and that
it's better to have the the very substantial memory and disk IO savings
with the crappy vacuum approach, than neither. And given the timeframe
there does seem to be a substantial danger of "neither" being the
outcome...  Anyway, I'm mentally sketching out what it'd take.

Greetings,

Andres Freund




Re: replication cleanup code incorrect way to use of HTAB HASH_REMOVE ?

2021-03-21 Thread Peter Smith
On Sun, Mar 21, 2021 at 8:54 PM Amit Kapila  wrote:
>
> On Sat, Mar 20, 2021 at 12:54 PM Peter Smith  wrote:
> >
> > PSA my patch to correct this by firstly doing a HASH_FIND, then only
> > HASH_REMOVE after we've finished using the ent.
> >
>
> Why can't we keep using HASH_REMOVE as it is but get the output (entry
> found or not) in the last parameter of hash_search API and then
> perform Assert based on that? See similar usage in reorderbuffer.c and
> rewriteheap.c.
>

Changing the Assert doesn't do anything to fix the problem as
described, i.e. dereferencing of ent after the HASH_REMOVE.

The real problem isn't the Assert. It's all those other usages of ent
disobeying the API rule: "(NB: in the case of the REMOVE action, the
result is a dangling pointer that shouldn't be dereferenced!)"

e.g.
-  SharedFileSetDeleteAll(ent->stream_fileset);
-  pfree(ent->stream_fileset);
-  ent->stream_fileset = NULL;
-  if (ent->subxact_fileset)
-  SharedFileSetDeleteAll(ent->subxact_fileset);
-  pfree(ent->subxact_fileset);
-  ent->subxact_fileset = NULL;

--
Kind Regards,
Peter Smith
Fujitsu Australia.




Re: Replication slot stats misgivings

2021-03-21 Thread Andres Freund
Hi,

On 2021-03-21 16:08:00 +0530, Amit Kapila wrote:
> On Sun, Mar 21, 2021 at 2:57 AM Andres Freund  wrote:
> > On 2021-03-20 10:28:06 +0530, Amit Kapila wrote:
> > > On Sat, Mar 20, 2021 at 9:25 AM Amit Kapila  
> > > wrote:
> > > > This idea is worth exploring to address the complaints but what do we
> > > > do when we detect that the stats are from the different slot? It has
> > > > mixed of stats from the old and new slot. We need to probably reset it
> > > > after we detect that.
> > > >
> > >
> > > What if the user created a slot with the same name after dropping the
> > > slot and it has used the same index. I think chances are less but
> > > still a possibility, but maybe that is okay.
> > >
> > > > What if after some frequency (say whenever we
> > > > run out of indexes) we check whether the slots we are maintaining is
> > > > pgstat.c have some stale slot entry (entry exists but the actual slot
> > > > is dropped)?
> > > >
> > >
> > > A similar drawback (the user created a slot with the same name after
> > > dropping it) exists with this as well.
> >
> > pgstat_report_replslot_drop() already prevents that, no?
> >
> 
> Yeah, normally it would prevent that but what if a drop message is lost?

That already exists as a danger, no? pgstat_recv_replslot() uses
pgstat_replslot_index() to find the slot by name. So if a drop message
is lost we'd potentially accumulate into stats of an older slot.  It'd
probably a lower risk with what I suggested, because the initial stat
report slot.c would use something like pgstat_report_replslot_create(),
which the stats collector can use to reset the stats to 0?

If we do it right the lossiness will be removed via shared memory stats
patch... But architecturally the name based lookup and unpredictable
number of stats doesn't fit in super well.

Greetings,

Andres Freund




Re: [HACKERS] Custom compression methods

2021-03-21 Thread Tom Lane
... btw, now that I look at this, why are we expending a configure
probe for  ?  If we need to cater for that spelling of
the header name, the C code proper is not ready for it.

regards, tom lane




Re: [HACKERS] Custom compression methods

2021-03-21 Thread Tom Lane
Justin Pryzby  writes:
> Rebased on HEAD.
> 0005 forgot to update compression_1.out.
> Included changes to ./configure.ac and some other patches, but not Tomas's,
> since it'll make CFBOT get mad as soon as that's pushed.

I pushed a version of the configure fixes that passes my own sanity
checks, and removes the configure warning with MacPorts.  That
obsoletes your 0006.  Of the rest, I prefer the 0009 approach
(make the GUC an enum) to 0008, and the others seem sane but I haven't
studied the code, so I'll leave it to Robert to handle them.

regards, tom lane




Re: [PATCH] Bug fix in initdb output

2021-03-21 Thread Andrew Dunstan


On 3/2/21 9:32 AM, Alvaro Herrera wrote:
> On 2021-Mar-02, Nitin Jadhav wrote:
>
>>> FWIW, I don't think that it is a good idea to come back to this
>>> decision for *nix platforms, so I would let it as-is, and use
>>> relative paths if initdb is called using a relative path.
>> The command to be displayed either in absolute path or relative path
>> depends on the way the user is calling initdb. I agree with the above
>> comment to keep this behaviour as-is, that is if the initdb is called
>> using relative path, then it displays relative path otherwise absolute
>> path.
> Yeah.
>
>>> However, if you can write something that makes the path printed
>>> compatible for a WIN32 terminal while keeping the behavior
>>> consistent with other platforms, people would have no objections to
>>> that.
>> I feel the patch attached above handles this scenario.
> Agreed.  I'll push the original patch then.  Thanks everybody for the
> discussion.
>


I'm late to the party on this which is my fault, I had my head down on
other stuff. But I just noticed this commit. Unfortunately the commit
message and this thread contain suggestions that aren't true. How do I
know? Because the buildfarm has for years called pg_ctl via cmd.exe with
a relative path with forward slashes. here's the relevant perl code that
runs on all platforms:


    chdir($installdir);
    my $cmd =
  qq{"bin/pg_ctl" -D data-$locale -l logfile -w start >startlog
2>&1};
    system($cmd);


Note that the pg_ctl path is quoted, and those quotes are passed through
to cmd.exe. That's what makes it work. It's possibly not worth changing
it now, but if anything that's the change that should have been made here.


Just wanted that on the record in case people got this wrong idea that
you can't use forward slashes when calling a program in cmd.exe.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
>> This seems somewhat repeatable (three identical failures in three
>> attempts).  Not sure why I did not see it yesterday; but anyway,
>> there is something wrong with partial detoasting for LZ4.

> With what version of LZ4 ?

RHEL8's, which is
lz4-1.8.3-2.el8.x86_64

regards, tom lane




Re: Built-in connection pooler

2021-03-21 Thread Zhihong Yu
Hi,

+  With load-balancing policy postmaster choose
proxy with lowest load average.
+  Load average of proxy is estimated by number of clients
connection assigned to this proxy with extra weight for SSL connections.

I think 'load-balanced' may be better than 'load-balancing'.
postmaster choose proxy -> postmaster chooses proxy

+  Load average of proxy is estimated by number of clients
connection assigned to this proxy with extra weight for SSL connections.

I wonder if there would be a mixture of connections with and without SSL.

+ Terminate an idle connection pool worker after the specified
number of milliseconds.

Should the time unit be seconds ? It seems a worker would exist for at
least a second.

+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group

It would be better to update the year in the header.

+* Use then for launching pooler worker backends and report error

Not sure I understand the above sentence. Did you mean 'them' instead of
'then' ?

Cheers

On Sun, Mar 21, 2021 at 11:32 AM Konstantin Knizhnik 
wrote:

> People asked me to resubmit built-in connection pooler patch to commitfest.
> Rebased version of connection pooler is attached.
>
>


Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 04:32:31PM -0400, Tom Lane wrote:
> This seems somewhat repeatable (three identical failures in three
> attempts).  Not sure why I did not see it yesterday; but anyway,
> there is something wrong with partial detoasting for LZ4.

With what version of LZ4 ?

-- 
Justin




Re: [HACKERS] Custom compression methods (mac+lz4.h)

2021-03-21 Thread Tom Lane
Dilip Kumar  writes:
>> Yeah, we need to set the default_toast_compression in the beginning of
>> the test as attached.
> In the last patch, I did not adjust the compression_1.out so fixed
> that in the attached patch.

Pushed that; however, while testing that it works as expected,
I saw a new and far more concerning regression diff:

diff -U3 /home/postgres/pgsql/src/test/regress/expected/strings.out 
/home/postgres/pgsql/src/test/regress/results/strings.out
--- /home/postgres/pgsql/src/test/regress/expected/strings.out  2021-02-18 
10:34:58.190304138 -0500
+++ /home/postgres/pgsql/src/test/regress/results/strings.out   2021-03-21 
16:27:22.029402834 -0400
@@ -1443,10 +1443,10 @@
 -- If start plus length is > string length, the result is truncated to
 -- string length
 SELECT substr(f1, 5, 10) from toasttest;
- substr 
-
- 567890
- 567890
+ substr 
+
+ 567890\x7F\x7F\x7F\x7F
+ 567890\x7F\x7F\x7F\x7F
  567890
  567890
 (4 rows)
@@ -1520,10 +1520,10 @@
 -- If start plus length is > string length, the result is truncated to
 -- string length
 SELECT substr(f1, 5, 10) from toasttest;
- substr 
-
- 567890
- 567890
+ substr 
+
+ 567890\177\177\177\177
+ 567890\177\177\177\177
  567890
  567890
 (4 rows)

This seems somewhat repeatable (three identical failures in three
attempts).  Not sure why I did not see it yesterday; but anyway,
there is something wrong with partial detoasting for LZ4.

regards, tom lane




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
Jan Wieck  writes:
> So let's focus on the actual problem of running out of XIDs and memory 
> while doing the upgrade involving millions of small large objects.

Right.  So as far as --single-transaction vs. --create goes, that's
mostly a definitional problem.  As long as the contents of a DB are
restored in one transaction, it's not gonna matter if we eat one or
two more XIDs while creating the DB itself.  So we could either
relax pg_restore's complaint, or invent a different switch that's
named to acknowledge that it's not really only one transaction.

That still leaves us with the lots-o-locks problem.  However, once
we've crossed the Rubicon of "it's not really only one transaction",
you could imagine that the switch is "--fewer-transactions", and the
idea is for pg_restore to commit after every (say) 10 operations.
That would both bound its lock requirements and greatly cut its XID
consumption.

The work you described sounded like it could fit into that paradigm,
with the additional ability to run some parallel restore tasks
that are each consuming a bounded number of locks.

regards, tom lane




[PATCH] In psql \?, add [+] annotation where appropriate

2021-03-21 Thread Matthijs van der Vleuten
Hi,

User inoas on IRC channel #postgresql noted that \? does not describe \do as 
supporting the + option. It does however support this option, as do \dAp and 
\dy.

This patch adds the annotation to the description of these commands in \?.

While adding it to the translation files I noticed some obvious errors, so this 
patch fixes the following as well:

* correct inconsistent alignment. This was especially egregious in fr.po. 
Differing alignment across sections in es.po is preserved as this may have been 
a deliberate choice by the translator.
* cs.po: remove extraneous newline
* es.po: replace incorrect mention of \do with the correct \dAc, \dAf and \dAo
* fr.po: merge some small lines that still fit within 78 chars
* fr.po: remove [brackets] denoting optional parameters, when these aren't 
present in the English text
* fr.po: \sv was shown as accepting "FONCTION". This is replaced with "VUE".
* fr.po: \t was missing its optional [on|off] parameter.
* tr.po: fix typo at \d
* uk.po: add missing newline at \c

Regards,
MatthijsFrom 285c679e1bff0a35cd400828182aca53c93083b1 Mon Sep 17 00:00:00 2001
From: Matthijs van der Vleuten 
Date: Sun, 21 Mar 2021 20:24:37 +0100
Subject: [PATCH] In psql \?, add [+] annotation.

User inoas on IRC channel #postgresql noted that according to \?, \do is not described as supporting the + option while it does in fact recognize the option. Additionally, \dAp and \dy also support +. This patch adds the annotation to \?.

Also fix some obvious errors in translations of the \? text:

* correct inconsistent alignment. This was especially egregious in fr.po. Differing alignment across sections in es.po is preserved as this may have been a deliberate choice by the translator.
* cs.po: remove extraneous newline
* es.po: replace incorrect mention of \do with the correct \dAc, \dAf and \dAo
* fr.po: merge some small lines that still fit within 78 chars
* fr.po: remove [brackets] denoting optional parameters, when these aren't present in the English text
* fr.po: \sv was shown as accepting "FONCTION". This is replaced with "VUE".
* fr.po: \t was missing its optional [on|off] parameter.
* tr.po: fix typo at \d
* uk.po: add missing newline at \c
---
 src/bin/psql/help.c  |   6 +-
 src/bin/psql/po/cs.po|  34 +++---
 src/bin/psql/po/de.po|  12 +--
 src/bin/psql/po/es.po|  22 ++--
 src/bin/psql/po/fr.po| 227 +++
 src/bin/psql/po/it.po|   8 +-
 src/bin/psql/po/ja.po|  10 +-
 src/bin/psql/po/ko.po|  10 +-
 src/bin/psql/po/ru.po|   8 +-
 src/bin/psql/po/sv.po|  14 +--
 src/bin/psql/po/tr.po|  58 +-
 src/bin/psql/po/uk.po|  44 
 src/bin/psql/po/zh_CN.po |   8 +-
 13 files changed, 227 insertions(+), 234 deletions(-)

diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index d027a4e3f0..640035f588 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -231,7 +231,7 @@ slashUsage(unsigned short int pager)
 	fprintf(output, _("  \\dAc[+] [AMPTRN [TYPEPTRN]]  list operator classes\n"));
 	fprintf(output, _("  \\dAf[+] [AMPTRN [TYPEPTRN]]  list operator families\n"));
 	fprintf(output, _("  \\dAo[+] [AMPTRN [OPFPTRN]]   list operators of operator families\n"));
-	fprintf(output, _("  \\dAp[AMPTRN [OPFPTRN]]   list support functions of operator families\n"));
+	fprintf(output, _("  \\dAp[+] [AMPTRN [OPFPTRN]]   list support functions of operator families\n"));
 	fprintf(output, _("  \\db[+]  [PATTERN]  list tablespaces\n"));
 	fprintf(output, _("  \\dc[S+] [PATTERN]  list conversions\n"));
 	fprintf(output, _("  \\dC[+]  [PATTERN]  list casts\n"));
@@ -254,7 +254,7 @@ slashUsage(unsigned short int pager)
 	fprintf(output, _("  \\dL[S+] [PATTERN]  list procedural languages\n"));
 	fprintf(output, _("  \\dm[S+] [PATTERN]  list materialized views\n"));
 	fprintf(output, _("  \\dn[S+] [PATTERN]  list schemas\n"));
-	fprintf(output, _("  \\do[S]  [PATTERN]  list operators\n"));
+	fprintf(output, _("  \\do[S+] [PATTERN]  list operators\n"));
 	fprintf(output, _("  \\dO[S+] [PATTERN]  list collations\n"));
 	fprintf(output, _("  \\dp [PATTERN]  list table, view, and sequence access privileges\n"));
 	fprintf(output, _("  \\dP[itn+] [PATTERN]list [only index/table] partitioned relations [n=nested]\n"));
@@ -267,7 +267,7 @@ slashUsage(unsigned short int pager)
 	fprintf(output, _("  \\du[S+] [PATTERN]  list roles\n"));
 	fprintf(output, _("  \\dv[S+] [PATTERN]  list views\n"));
 	fprintf(output, _("  \\dx[+]  [PATTERN]  list extensions\n"));
-	fprintf(output, _("  \\dy [PATTERN]  list event triggers\n"));
+	fprintf(output, _("  \\dy[+]  [PATTERN]  list event triggers\n"));
 	fprintf(output, _("  \\l[+]   [PATTERN]  list databases\n"));
 	fprintf(output, _("  \\sf[+]  FUNCNAME   show a function's definition\n"));
 	fprintf(output, _("  \\sv[+]  VIEWNAME   show a view's definition\n"));
diff --git 

Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Jan Wieck

On 3/21/21 2:34 PM, Tom Lane wrote:

and I see

--
-- Name: joe; Type: DATABASE; Schema: -; Owner: joe
--

CREATE DATABASE joe WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LOCALE = 
'C';


ALTER DATABASE joe OWNER TO joe;

so at least in this case it's doing the right thing.  We need a bit
more detail about the context in which it's doing the wrong thing
for you.


After moving all of this to a pristine postgresql.org based repo I see 
the same. My best guess at this point is that the permission hoops, that 
RDS and Aurora PostgreSQL are jumping through, was messing with this. 
But that has nothing to do with the actual topic.


So let's focus on the actual problem of running out of XIDs and memory 
while doing the upgrade involving millions of small large objects.



Regards, Jan


--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Different compression methods for FPI

2021-03-21 Thread Justin Pryzby
rebased to keep cfbot happy.  This will run with default=zlib.

On Mon, Mar 15, 2021 at 01:09:18PM -0500, Justin Pryzby wrote:
> On Sun, Mar 14, 2021 at 07:31:35PM -0500, Justin Pryzby wrote:
> > On Sat, Mar 13, 2021 at 08:48:33PM +0500, Andrey Borodin wrote:
> > > > 13 марта 2021 г., в 06:28, Justin Pryzby  
> > > > написал(а):
> > > > Updated patch with a minor fix to configure.ac to avoid warnings on OSX.
> > > > And 2ndary patches from another thread to allow passing recovery tests.
> > > > Renamed to WAL_COMPRESSION_*
> > > > Split LZ4 support to a separate patch and support zstd.  These show that
> > > > changes needed for a new compression method have been minimized, 
> > > > although not
> > > > yet moved to a separate, abstracted compression/decompression function.
> > > 
> > > Thanks! Awesome work!
> > > 
> > > > These two patches are a prerequisite for this patch to progress:
> > > > * Run 011_crash_recovery.pl with wal_level=minimal
> > > > * Make sure published XIDs are persistent
> > > > 
> > > > I don't know if anyone will consider this patch for v14 - if not, it 
> > > > should be
> > > > set to v15 and revisit in a month.  
> > > 
> > > I want to note, that fixes for 011_crash_recovery.pl are not strictly 
> > > necessary for this patch set.
> > > The problem in tests arises if we turn on wal_compression, absolutely 
> > > independently from wal compression method.
> > > We turn on wal_compression in this test only for CI purposes.
> > 
> > I rearranged the patches to reflect this.
> > Change to zlib and zstd to level=1.
> > Add support for negative "zstd fast" levels.
> > Use correct length accounting for "hole" in LZ4 and ZSTD.
> > Fixed Solution.pm for zstd on windows.
> > Switch to zstd by default (for CI).
> > Add docs.
> 
> Changes:
> - Allocate buffer sufficient to accommodate any supported compression method;
> - Use existing info flags argument rather than adding another byte for storing
>   the compression method; this seems to be what was anticipated by commit
>   57aa5b2bb and what Michael objected to.
> 
> I think the existing structures are ugly, so maybe this suggests using a GUC
> assign hook to support arbitrary compression level, and maybe other options.
>From 5a7a38464dfe427a8eaf9f3bf054c69513a1e0d7 Mon Sep 17 00:00:00 2001
From: Andrey Borodin 
Date: Sat, 27 Feb 2021 09:03:50 +0500
Subject: [PATCH 01/12] Allow alternate compression methods for wal_compression

TODO: bump XLOG_PAGE_MAGIC
---
 doc/src/sgml/config.sgml  | 17 +
 src/backend/Makefile  |  2 +-
 src/backend/access/transam/xlog.c | 10 +++
 src/backend/access/transam/xloginsert.c   | 67 ---
 src/backend/access/transam/xlogreader.c   | 64 +-
 src/backend/utils/misc/guc.c  | 11 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/access/xlog_internal.h| 16 +
 src/include/access/xlogrecord.h   | 11 ++-
 10 files changed, 187 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..bed89d055d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3072,6 +3072,23 @@ include_dir 'conf.d'
   
  
 
+ 
+  wal_compressionion_method (enum)
+  
+   wal_compression_method configuration parameter
+  
+  
+  
+   
+This parameter selects the compression method used to compress WAL when
+wal_compression is enabled.
+The supported methods are pglz and zlib.
+The default value is pglz.
+Only superusers can change this setting.
+   
+  
+ 
+
  
   wal_init_zero (boolean)
   
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..3af216ddfc 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -48,7 +48,7 @@ OBJS = \
 LIBS := $(filter-out -lpgport -lpgcommon, $(LIBS)) $(LDAP_LIBS_BE) $(ICU_LIBS)
 
 # The backend doesn't need everything that's in LIBS, however
-LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
+LIBS := $(filter-out -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS))
 
 ifeq ($(with_systemd),yes)
 LIBS += -lsystemd
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6f8810e149..5b657ec724 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -99,6 +99,7 @@ bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
 bool		wal_compression = false;
+int			wal_compression_method = WAL_COMPRESSION_PGLZ;
 char	   *wal_consistency_checking_string = NULL;
 bool	   *wal_consistency_checking = NULL;
 bool		wal_init_zero = true;
@@ -180,6 +181,15 @@ const struct config_enum_entry recovery_target_action_options[] = {
 	{NULL, 0, false}
 };
 
+/* Note that due to 

Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
I wrote:
> ... so at least in this case it's doing the right thing.  We need a bit
> more detail about the context in which it's doing the wrong thing
> for you.

Just to cross-check, I tried modifying pg_upgrade's regression test
as attached, and it still passes.  (And inspection of the leftover
dump2.sql file verifies that the database ownership was correct.)
So I'm not sure what's up here.

regards, tom lane

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 9c6deae294..436646b5ba 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -150,6 +150,9 @@ export EXTRA_REGRESS_OPTS
 standard_initdb "$oldbindir"/initdb
 "$oldbindir"/pg_ctl start -l "$logdir/postmaster1.log" -o "$POSTMASTER_OPTS" -w
 
+# Create another user (just to exercise database ownership restoration).
+createuser regression_dbowner || createdb_status=$?
+
 # Create databases with names covering the ASCII bytes other than NUL, BEL,
 # LF, or CR.  BEL would ring the terminal bell in the course of this test, and
 # it is not otherwise a special case.  PostgreSQL doesn't support the rest.
@@ -160,7 +163,7 @@ dbname1='\"\'$dbname1'\\"\\\'
 dbname2=`awk 'BEGIN { for (i = 46; i <  91; i++) printf "%c", i }' 

Re: default result formats setting

2021-03-21 Thread Emre Hasegeli
> Could you look into the log files in that test directory what is going
> on?

Command 'test-result-format' not found in
/Users/hasegeli/Developer/postgres/tmp_install/Users/hasegeli/.local/pgsql/bin,
/Users/hasegeli/.local/bin, /opt/homebrew/bin, /usr/local/bin,
/usr/bin, /bin, /usr/sbin, /sbin,
/Users/hasegeli/Developer/postgres/src/test/modules/libpq_extended at
/Users/hasegeli/Developer/postgres/src/test/modules/libpq_extended/../../../../src/test/perl/TestLib.pm
line 818.

Maybe you forgot to commit the file in the test?

> The test setup is closely modeled after
> src/test/modules/libpq_pipeline/.  Does that one run ok?

Yes




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 01:14:20PM -0500, Justin Pryzby wrote:
> On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > But note that it doesn't check if an existing constraint "implies" the new
> > > constraint - maybe it should.
> > 
> > Hm, I'm not sure I want to do that, because that means that if I later
> > have to attach the partition again with the same partition bounds, then
> > I might have to incur a scan to recheck the constraint.  I think we want
> > to make the new constraint be as tight as possible ...
> 
> The ATTACH PARTITION checks if any existing constraint impilies the (proposed)
> partition bounds, not just if constraints are equal.  So I'm suggesting to do
> the same here.

On Sun, Mar 21, 2021 at 04:07:12PM -0300, Alvaro Herrera wrote:
> On 2021-Mar-21, Justin Pryzby wrote:
> > On Sun, Mar 21, 2021 at 03:22:00PM -0300, Alvaro Herrera wrote:
> > > So if we do that on DETACH, what would happen on ATTACH?
> > 
> > Do you mean what happens to the constraint that was already there ?
> > Nothing, since it's not ours to mess with.  Checking ImpliedBy() rather than
> > equal() doesn't change that.
> 
> No, I meant what happens regarding checking existing values in the
> table: is the table scanned even if the partition constraint is implied
> by existing table constraints?

I'm still not sure we're talking about the same thing.

Your patch adds a CHECK constraint during DETACH CONCURRENTLY, and I suggested
that it should avoid adding it if it's redundant with an existing constraint,
even if not equal().

The current behavior (since v10) is this:

postgres=# ALTER TABLE p ATTACH PARTITION p1 FOR VALUES FROM (1)TO(2);
DEBUG:  partition constraint for table "p1" is implied by existing constraints
ALTER TABLE

And that wouldn't change, except the CHECK constraint would be added
automatically during detach (if it wasn't already implied).  Maybe the CHECK
constraint should be added without CONCURRENTLY, too.  One fewer difference in
behavior.

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Justin Pryzby wrote:

> On Sun, Mar 21, 2021 at 03:22:00PM -0300, Alvaro Herrera wrote:
>
> > So if we do that on DETACH, what would happen on ATTACH?
> 
> Do you mean what happens to the constraint that was already there ?
> Nothing, since it's not ours to mess with.  Checking ImpliedBy() rather than
> equal() doesn't change that.

No, I meant what happens regarding checking existing values in the
table: is the table scanned even if the partition constraint is implied
by existing table constraints?

> I proposed this a few years ago for DETACH (without concurrently), 
> specifically
> to avoid the partition scans.
> https://www.postgresql.org/message-id/20180601221428.gu5...@telsasoft.com
> |The docs say: if detaching/re-attach a partition, should first ADD CHECK to
> |avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
> |implicitly CREATE a constraint which is usable when reATTACHing?

Well, I agree with you that we should add such a constraint.

-- 
Álvaro Herrera   Valdivia, Chile
"The problem with the future is that it keeps turning into the present"
(Hobbes)




Re: default result formats setting

2021-03-21 Thread Peter Eisentraut



On 19.03.21 15:55, Emre Hasegeli wrote:

I applied the patch, tried running the test and got the following:

rm -rf 
'/Users/hasegeli/Developer/postgres/src/test/modules/libpq_extended'/tmp_check
/bin/sh ../../../../config/install-sh -c -d 
'/Users/hasegeli/Developer/postgres/src/test/modules/libpq_extended'/tmp_check
cd . && TESTDIR='/Users/hasegeli/Developer/postgres/src/test/modules/libpq_extended' 
PATH="/Users/hasegeli/Developer/postgres/tmp_install/Users/hasegeli/.local/pgsql/bin:$PATH" 
DYLD_LIBRARY_PATH="/Users/hasegeli/Developer/postgres/tmp_install/Users/hasegeli/.local/pgsql/lib"
  PGPORT='65432' 
PG_REGRESS='/Users/hasegeli/Developer/postgres/src/test/modules/libpq_extended/../../../../src/test/regress/pg_regress'
 REGRESS_SHLIB='/Users/hasegeli/Developer/postgres/src/test/regress/regress.so' /usr/bin/prove -I 
../../../../src/test/perl/ -I .  t/*.pl
t/001_result_format.pl .. # Looks like your test exited with 2 before it could 
output anything.
t/001_result_format.pl .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 4/4 subtests

Test Summary Report
---
t/001_result_format.pl (Wstat: 512 Tests: 0 Failed: 0)
   Non-zero exit status: 2
   Parse errors: Bad plan.  You planned 4 tests but ran 0.



Could you look into the log files in that test directory what is going 
on?  The test setup is closely modeled after 
src/test/modules/libpq_pipeline/.  Does that one run ok?





Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
I wrote:
> Needs a little more work than that --- we should allow it to respond
> to the --no-owner switch, for example.  But I think likely we can do
> it where other object ownership is handled.  I'll look in a bit.

Actually ... said code already DOES do that, so now I'm confused.
I tried

regression=# create user joe;
CREATE ROLE
regression=# create database joe owner joe;
CREATE DATABASE
regression=# \q
$ pg_dump -Fc joe >joe.dump
$ pg_restore --create -f - joe.dump | more

and I see

--
-- Name: joe; Type: DATABASE; Schema: -; Owner: joe
--

CREATE DATABASE joe WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LOCALE = 
'C';


ALTER DATABASE joe OWNER TO joe;

so at least in this case it's doing the right thing.  We need a bit
more detail about the context in which it's doing the wrong thing
for you.

regards, tom lane




Re: Built-in connection pooler

2021-03-21 Thread Konstantin Knizhnik

People asked me to resubmit built-in connection pooler patch to commitfest.
Rebased version of connection pooler is attached.

diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index 6fbfef2b12..27aa6cba8e 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -11,6 +11,7 @@
 
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -94,6 +95,8 @@ check_primary_key(PG_FUNCTION_ARGS)
 	else
 		tuple = trigdata->tg_newtuple;
 
+	MyProc->is_tainted = true;
+
 	trigger = trigdata->tg_trigger;
 	nargs = trigger->tgnargs;
 	args = trigger->tgargs;
@@ -286,6 +289,8 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: cannot process INSERT events");
 
+	MyProc->is_tainted = true;
+
 	/* Have to check tg_trigtuple - tuple being deleted */
 	trigtuple = trigdata->tg_trigtuple;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ee4925d6d9..4c862bbae9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -734,6 +734,169 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is enabled.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connections are not accepted.
+
+
+  The default value is 1000. This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  session_pool_size (integer)
+  
+   session_pool_size configuration parameter
+  
+  
+  
+
+  Enables session pooling and defines the maximum number of
+  backends that can be used by client sessions for each database/user combination.
+  Launched non-tainted backends are never terminated even if there are no active sessions.
+  Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements.
+  Tainted backend can server only one client.
+
+
+  The default value is 10, so up to 10 backends will serve each database,
+
+  
+ 
+
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+
+  Sets the TCP port for the connection pooler.
+  Clients connected to main "port" will be assigned dedicated backends,
+  while client connected to proxy port will be connected to backends through proxy which
+  performs transaction level scheduling. 
+   
+
+  The default value is 6543.
+
+  
+ 
+
+ 
+  connection_proxies (integer)
+  
+   connection_proxies configuration parameter
+  
+  
+  
+
+  Sets number of connection proxies.
+  Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing).
+  Each proxy launches its own subset of backends.
+  So maximal number of non-tainted backends is  session_pool_size*connection_proxies*databases*roles.
+   
+
+  The default value is 0, so session pooling is disabled.
+
+  
+ 
+
+ 
+  session_schedule (enum)
+  
+   session_schedule configuration parameter
+  
+  
+  
+
+  Specifies scheduling policy for assigning session to proxies in case of
+  connection pooling. Default policy is round-robin.
+
+
+  With round-robin policy postmaster cyclicly scatter sessions between proxies.
+
+
+  With random policy postmaster randomly choose proxy for new session.
+
+
+  With load-balancing policy postmaster choose proxy with lowest load average.
+  Load average of proxy is estimated by number of clients connection assigned to this proxy with extra weight for SSL connections.
+   
+  
+ 
+
+ 
+  idle_pool_worker_timeout (integer)
+  
+   idle_pool_worker_timeout configuration parameter
+  
+  
+  
+
+ Terminate an idle connection pool worker after the specified number of milliseconds.
+ The default value is 0, so pool workers are never terminated.
+   
+  
+ 
+
+ 
+  restart_pooler_on_reload (boolean)
+  
+   restart_pooler_on_reload configuration parameter
+  
+  
+  
+
+  Restart session pool workers once pg_reload_conf() is called.
+  The default 

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 03:22:00PM -0300, Alvaro Herrera wrote:
> On 2021-Mar-21, Justin Pryzby wrote:
> 
> > On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > > But note that it doesn't check if an existing constraint "implies" the 
> > > > new
> > > > constraint - maybe it should.
> > > 
> > > Hm, I'm not sure I want to do that, because that means that if I later
> > > have to attach the partition again with the same partition bounds, then
> > > I might have to incur a scan to recheck the constraint.  I think we want
> > > to make the new constraint be as tight as possible ...
> > 
> > The ATTACH PARTITION checks if any existing constraint impilies the 
> > (proposed)
> > partition bounds, not just if constraints are equal.  So I'm suggesting to 
> > do
> > the same here.
> 
> So if we do that on DETACH, what would happen on ATTACH?

Do you mean what happens to the constraint that was already there ?
Nothing, since it's not ours to mess with.  Checking ImpliedBy() rather than
equal() doesn't change that.

I proposed this a few years ago for DETACH (without concurrently), specifically
to avoid the partition scans.
https://www.postgresql.org/message-id/20180601221428.gu5...@telsasoft.com
|The docs say: if detaching/re-attach a partition, should first ADD CHECK to
|avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
|implicitly CREATE a constraint which is usable when reATTACHing?

-- 
Justin




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Tom Lane
Jan Wieck  writes:
>> On 3/21/21 12:57 PM, Tom Lane wrote:
>>> I think maybe what we have here is a bug in pg_restore, its
>>> --create switch ought to be trying to update the database's
>>> ownership.

> Thanks for that. I like this patch a lot better.

Needs a little more work than that --- we should allow it to respond
to the --no-owner switch, for example.  But I think likely we can do
it where other object ownership is handled.  I'll look in a bit.

regards, tom lane




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Justin Pryzby wrote:

> On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > > But note that it doesn't check if an existing constraint "implies" the new
> > > constraint - maybe it should.
> > 
> > Hm, I'm not sure I want to do that, because that means that if I later
> > have to attach the partition again with the same partition bounds, then
> > I might have to incur a scan to recheck the constraint.  I think we want
> > to make the new constraint be as tight as possible ...
> 
> The ATTACH PARTITION checks if any existing constraint impilies the (proposed)
> partition bounds, not just if constraints are equal.  So I'm suggesting to do
> the same here.

So if we do that on DETACH, what would happen on ATTACH?

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-21 Thread Andrew Dunstan


On 3/21/21 12:56 PM, Jan Wieck wrote:
> On 3/21/21 7:47 AM, Andrew Dunstan wrote:
>> One possible (probable?) source is the JDBC driver, which currently
>> treats all Blobs (and Clobs, for that matter) as LOs. I'm working on
>> improving that some: 
>
> You mean the user is using OID columns pointing to large objects and
> the JDBC driver is mapping those for streaming operations?
>
> Yeah, that would explain a lot.
>
>
>


Probably in most cases the database is designed by Hibernate, and the
front end programmers know nothing at all of Oids or LOs, they just ask
for and get a Blob.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote:
> > But note that it doesn't check if an existing constraint "implies" the new
> > constraint - maybe it should.
> 
> Hm, I'm not sure I want to do that, because that means that if I later
> have to attach the partition again with the same partition bounds, then
> I might have to incur a scan to recheck the constraint.  I think we want
> to make the new constraint be as tight as possible ...

The ATTACH PARTITION checks if any existing constraint impilies the (proposed)
partition bounds, not just if constraints are equal.  So I'm suggesting to do
the same here.

-- 
Justin




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-19, Alvaro Herrera wrote:

> diff --git a/src/backend/utils/cache/partcache.c 
> b/src/backend/utils/cache/partcache.c
> index 0fe4f55b04..6dfa3fb4a8 100644
> --- a/src/backend/utils/cache/partcache.c
> +++ b/src/backend/utils/cache/partcache.c
> @@ -352,16 +352,9 @@ generate_partition_qual(Relation rel)
>   return copyObject(rel->rd_partcheck);
>  
>   /*
> -  * Obtain parent relid; if it's invalid, then the partition is being
> -  * detached.  The constraint is NIL in that case, and let's cache that.
> +  * Obtain parent relid.  XXX explain why we need this
>*/
> - parentrelid = get_partition_parent(RelationGetRelid(rel));
> - if (parentrelid == InvalidOid)
> - {
> - rel->rd_partcheckvalid = true;
> - rel->rd_partcheck = NIL;
> - return NIL;
> - }
> + parentrelid = get_partition_parent(RelationGetRelid(rel), true);

One thing that makes me uneasy about this, is that I don't understand
how does this happen with your test of two psqls doing attach/detach.
(It is necessary for the case when the waiting concurrent detach is
canceled, and so this fix is necessary anyway).  In your test, no
waiting transaction is ever cancelled; so what is the period during
which the relation is not locked that causes this code to be hit?  I
fear that there's a bug in the lock protocol somewhere.

-- 
Álvaro Herrera   Valdivia, Chile
"In Europe they call me Niklaus Wirth; in the US they call me Nickel's worth.
 That's because in Europe they call me by name, and in the US by value!"




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Justin Pryzby wrote:

> On Fri, Mar 19, 2021 at 10:57:37AM -0300, Alvaro Herrera wrote:
> > > Also, it "fails to avoid" adding duplicate constraints:
> > > 
> > > Check constraints:
> > > "c" CHECK (i IS NOT NULL AND i > 1 AND i < 2)
> > > "cc" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > > "p1_check" CHECK (true)
> > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > 
> > Do you mean the "cc" and "p1_i_check" one?  I mean, if you already have
> 
> No, I started with c and cc, and it added the broken constraint p1_check 
> (which
> you say you've fixed) and the redundant constraint p1_i_check.  I guess that's
> what you meant.

Yes, that's what I meant.

> > a constraint in the partition that duplicates the partition constraint,
> > then during attach we still create our new constraint?  I guess a
> > solution to this would be to scan all constraints and see if any equals
> > the expression that the new one would have.  Sounds easy enough now that
> > write it out loud.
> 
> But it looks like DetachAddConstraintIfNeeded already intended to do that:
> 
> +if (equal(constraintExpr, thisconstr))
> +return;

Hah, so I had already done it, but forgot.

> Actually, it appears your latest notpatch resolves both these issues.

Great.

> But note that it doesn't check if an existing constraint "implies" the new
> constraint - maybe it should.

Hm, I'm not sure I want to do that, because that means that if I later
have to attach the partition again with the same partition bounds, then
I might have to incur a scan to recheck the constraint.  I think we want
to make the new constraint be as tight as possible ...

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-03-21 Thread Justin Pryzby
On Fri, Mar 19, 2021 at 10:57:37AM -0300, Alvaro Herrera wrote:
> > Also, it "fails to avoid" adding duplicate constraints:
> > 
> > Check constraints:
> > "c" CHECK (i IS NOT NULL AND i > 1 AND i < 2)
> > "cc" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > "p1_check" CHECK (true)
> > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> 
> Do you mean the "cc" and "p1_i_check" one?  I mean, if you already have

No, I started with c and cc, and it added the broken constraint p1_check (which
you say you've fixed) and the redundant constraint p1_i_check.  I guess that's
what you meant.

> a constraint in the partition that duplicates the partition constraint,
> then during attach we still create our new constraint?  I guess a
> solution to this would be to scan all constraints and see if any equals
> the expression that the new one would have.  Sounds easy enough now that
> write it out loud.

But it looks like DetachAddConstraintIfNeeded already intended to do that:

+if (equal(constraintExpr, thisconstr)) 


+return;



Actually, it appears your latest notpatch resolves both these issues.
But note that it doesn't check if an existing constraint "implies" the new
constraint - maybe it should.

-- 
Justin




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Jan Wieck

On 3/21/21 1:15 PM, Jan Wieck wrote:

On 3/21/21 12:57 PM, Tom Lane wrote:

Jan Wieck  writes:

On 3/20/21 12:39 AM, Jan Wieck wrote:

On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).



Patch attached.


Hmm, doesn't this lose all *other* database-level properties?

I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.


Possibly. I didn't look into that route.


Thanks for that. I like this patch a lot better.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f8bec3f..19c1e71 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3030,6 +3030,8 @@ dumpDatabase(Archive *fout)
 	resetPQExpBuffer(creaQry);
 	resetPQExpBuffer(delQry);
 
+	appendPQExpBuffer(creaQry, "ALTER DATABASE %s OWNER TO %s;\n", qdatname, dba);
+
 	if (strlen(datconnlimit) > 0 && strcmp(datconnlimit, "-1") != 0)
 		appendPQExpBuffer(creaQry, "ALTER DATABASE %s CONNECTION LIMIT = %s;\n",
 		  qdatname, datconnlimit);


Re: New IndexAM API controlling index vacuum strategies

2021-03-21 Thread Peter Geoghegan
On Sun, Mar 21, 2021 at 1:24 AM Greg Stark  wrote:
> What I've seen is an application that regularly ran ANALYZE on a
> table. This worked fine as long as vacuums took less than the interval
> between analyzes (in this case 1h) but once vacuum started taking
> longer than that interval autovacuum would cancel it every time due to
> the conflicting lock.
>
> That would have just continued until the wraparound vacuum which
> wouldn't self-cancel except that there was also a demon running which
> would look for sessions stuck on a lock and kill the blocker -- which
> included killing the wraparound vacuum.

That's a new one! Though clearly it's an example of what I described.
I do agree that sometimes the primary cause is the special rules for
cancellations with anti-wraparound autovacuums.

> And yes, this demon is obviously a terrible idea but of course it was
> meant for killing buggy user queries. It wasn't expecting to find
> autovacuum jobs blocking things.  The real surprise for that user was
> that VACUUM could be blocked by things that someone would reasonably
> want to run regularly like ANALYZE.

The infrastructure from my patch to eliminate the tupgone special case
(the patch that fully decouples index and heap vacuuming from pruning
and freezing) ought to enable smarter autovacuum cancellations. It
should be possible to make "canceling" an autovacuum worker actually
instruct the worker to consider the possibility of finishing off the
VACUUM operation very quickly, by simply ending index vacuuming (and
heap vacuuming). It should only be necessary to cancel when that
strategy won't work out, because we haven't finished all required
pruning and freezing yet -- which are the only truly essential tasks
of any "successful" VACUUM operation.

Maybe it would only be appropriate to do something like that for
anti-wraparound VACUUMs, which, as you say, don't get cancelled when
they block the acquisition of a lock (which is a sensible design,
though only because of the specific risk of not managing to advance
relfrozenxid). There wouldn't be a question of canceling an
anti-wraparound VACUUM in the conventional sense with this mechanism.
It would simply instruct the anti-wraparound VACUUM to finish as
quickly as possible by skipping the indexes. Naturally the
implementation wouldn't really need to consider whether that meant the
anti-wraparound VACUUM could end almost immediately, or some time
later -- the point is that it completes ASAP.

-- 
Peter Geoghegan




Re: Fix pg_upgrade to preserve datdba

2021-03-21 Thread Jan Wieck

On 3/21/21 12:57 PM, Tom Lane wrote:

Jan Wieck  writes:

On 3/20/21 12:39 AM, Jan Wieck wrote:

On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).



Patch attached.


Hmm, doesn't this lose all *other* database-level properties?

I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.


Possibly. I didn't look into that route.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Re: Fix pg_upgrade to preserve datdba (was: Re: pg_upgrade failing for 200+ million Large Objects)

2021-03-21 Thread Tom Lane
Jan Wieck  writes:
> On 3/20/21 12:39 AM, Jan Wieck wrote:
>> On the way pg_upgrade also mangles the pg_database.datdba
>> (all databases are owned by postgres after an upgrade; will submit a
>> separate patch for that as I consider that a bug by itself).

> Patch attached.

Hmm, doesn't this lose all *other* database-level properties?

I think maybe what we have here is a bug in pg_restore, its
--create switch ought to be trying to update the database's
ownership.

regards, tom lane




Re: pg_upgrade failing for 200+ million Large Objects

2021-03-21 Thread Jan Wieck

On 3/21/21 7:47 AM, Andrew Dunstan wrote:

One possible (probable?) source is the JDBC driver, which currently
treats all Blobs (and Clobs, for that matter) as LOs. I'm working on
improving that some: 


You mean the user is using OID columns pointing to large objects and the 
JDBC driver is mapping those for streaming operations?


Yeah, that would explain a lot.


Thanks, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services




Fix pg_upgrade to preserve datdba (was: Re: pg_upgrade failing for 200+ million Large Objects)

2021-03-21 Thread Jan Wieck

On 3/20/21 12:39 AM, Jan Wieck wrote:

On the way pg_upgrade also mangles the pg_database.datdba
(all databases are owned by postgres after an upgrade; will submit a
separate patch for that as I consider that a bug by itself).


Patch attached.


Regards, Jan

--
Jan Wieck
Principle Database Engineer
Amazon Web Services
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 5d9a26c..38f7202 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -344,6 +344,7 @@ get_db_infos(ClusterInfo *cluster)
 	DbInfo	   *dbinfos;
 	int			i_datname,
 i_oid,
+i_datdba,
 i_encoding,
 i_datcollate,
 i_datctype,
@@ -351,9 +352,12 @@ get_db_infos(ClusterInfo *cluster)
 	char		query[QUERY_ALLOC];
 
 	snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, u.rolname, d.encoding, "
+			 "d.datcollate, d.datctype, "
 			 "%s AS spclocation "
 			 "FROM pg_catalog.pg_database d "
+			 " JOIN pg_catalog.pg_authid u "
+			 " ON d.datdba = u.oid "
 			 " LEFT OUTER JOIN pg_catalog.pg_tablespace t "
 			 " ON d.dattablespace = t.oid "
 			 "WHERE d.datallowconn = true "
@@ -367,6 +371,7 @@ get_db_infos(ClusterInfo *cluster)
 
 	i_oid = PQfnumber(res, "oid");
 	i_datname = PQfnumber(res, "datname");
+	i_datdba = PQfnumber(res, "rolname");
 	i_encoding = PQfnumber(res, "encoding");
 	i_datcollate = PQfnumber(res, "datcollate");
 	i_datctype = PQfnumber(res, "datctype");
@@ -379,6 +384,7 @@ get_db_infos(ClusterInfo *cluster)
 	{
 		dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
 		dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
+		dbinfos[tupnum].db_owner = pg_strdup(PQgetvalue(res, tupnum, i_datdba));
 		dbinfos[tupnum].db_encoding = atoi(PQgetvalue(res, tupnum, i_encoding));
 		dbinfos[tupnum].db_collate = pg_strdup(PQgetvalue(res, tupnum, i_datcollate));
 		dbinfos[tupnum].db_ctype = pg_strdup(PQgetvalue(res, tupnum, i_datctype));
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index e23b8ca..8fd9a13 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -378,18 +378,36 @@ create_new_objects(void)
 		 * propagate its database-level properties.
 		 */
 		if (strcmp(old_db->db_name, "postgres") == 0)
-			create_opts = "--clean --create";
+		{
+			parallel_exec_prog(log_file_name,
+			   NULL,
+			   "\"%s/pg_restore\" %s --exit-on-error "
+			   "--verbose --clean --create "
+			   "--dbname template1 \"%s\"",
+			   new_cluster.bindir,
+			   cluster_conn_opts(_cluster),
+			   sql_file_name);
+		}
 		else
-			create_opts = "--create";
-
-		parallel_exec_prog(log_file_name,
-		   NULL,
-		   "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
-		   "--dbname template1 \"%s\"",
-		   new_cluster.bindir,
-		   cluster_conn_opts(_cluster),
-		   create_opts,
-		   sql_file_name);
+		{
+			exec_prog(log_file_name, NULL, true, true,
+	  "\"%s/createdb\" -O \"%s\" %s \"%s\"",
+	  new_cluster.bindir,
+	  old_db->db_owner,
+	  cluster_conn_opts(_cluster),
+	  old_db->db_name);
+			parallel_exec_prog(log_file_name,
+			   NULL,
+			   "\"%s/pg_restore\" %s --exit-on-error "
+			   "--verbose "
+			   "--dbname \"%s\" \"%s\"",
+			   new_cluster.bindir,
+			   cluster_conn_opts(_cluster),
+			   old_db->db_name,
+			   sql_file_name);
+		}
+
+
 	}
 
 	/* reap all children */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a784..a3cda97 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -177,6 +177,7 @@ typedef struct
 {
 	Oid			db_oid;			/* oid of the database */
 	char	   *db_name;		/* database name */
+	char	   *db_owner;		/* database owner */
 	char		db_tablespace[MAXPGPATH];	/* database default tablespace
 			 * path */
 	char	   *db_collate;


Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack  writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
>>

And the patch renamed.

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b')   o
  substring matching the entire pattern should be inserted.  Write
  \\ if you need to put a literal backslash in the replacement
  text.
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of source by default.  Optional occurrence
+ parameter indicates which occurrence of pattern in
+ source should be replaced.  Default value for occurrence
+ is 1, replace only the first occurrence.  The flags parameter
+ is an optional text string containing zero or more single-letter flags that change the
  function's behavior.  Flag i specifies case-insensitive
  matching, while flag g specifies replacement of each matching
- substring rather than only the first one.  Supported flags (though
- not g) are
+ substring rather than only the first one. When occurrence
+ is set flag g has no effect. If occurrence
+ is set to zero, all occurrences are replaced which is similar to flag g.
+ Supported flags (though not g) are
  described in .
 
 
@@ -5482,6 +5581,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
fooXarYXazY
+regexp_replace('A PostgreSQL function', 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack  writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
> +1 to that.  Just to add a note, I do have some ideas about extending
> our regex parser so that it could duplicate the XQuery syntax --- none
> of the points we mention in 9.7.3.8 seem insurmountable.  I'm not
> planning to work on that in the near future, mind you, but I definitely
> think that we don't want to paint ourselves into a corner where we've
> already implemented the XQuery regex functions with the wrong behavior.
>
>   regards, tom lane


I apologize for confusing with the words and phrases I have used. This
patch implements the regexp_foo () functions which are available in most
RDBMS with the behavior described in the documentation. I have modified
the title of the patch in the commitfest to removed wrong use of XQUERY. 


I don't know too if the other RDBMS respect the XQUERY behavior but for
what I've seen for Oracle they are using limited regexp modifiers with
sometime not the same letter than PostgreSQL for the same behavior. I
have implemented these functions with the Oracle behavior in Orafce [1]
with a function that checks the modifiers used. This patch doesn't mimic
the Oracle behavior, it use the PostgreSQL behavior with regexp, the one
used by regex_replace() and regex_matches(). All regexp modifiers can be
used.


[1] https://github.com/orafce/orafce/blob/master/orafce--3.14--3.15.sql


-- 
Gilles Darold
http://www.darold.net/



Re: Using COPY FREEZE in pgbench

2021-03-21 Thread Fabien COELHO




Attached is the v5 patch.


About v5: doc gen ok, global and local make check ok.

I did a few tests on my laptop. Is seems that copying takes a little more 
time, say about 10%, but vacuum is indeed very significantly reduced, so 
that the total time for copying and vacuuming is reduced by 10% on 
overall.


So it is okay for me.

--
Fabien.




Re: "has_column_privilege()" issue with attnums and non-existent columns

2021-03-21 Thread Tom Lane
Joe Conway  writes:
> On 3/16/21 2:45 PM, Joe Conway wrote:
>> Ian, or anyone else, any comments/complaints on my changes? If not I will 
>> commit
>> and push that version sooner rather than later.

I took a quick look, and I'm afraid I don't believe this:

!* We have to check for dropped attribute first, and we rely on the
!* syscache not to notice a concurrent drop before pg_attribute_aclcheck
!* fetches the row.

What the existing code is assuming is that if we do a successful
SearchSysCache check, then an immediately following pg_xxx_aclcheck call
that needs to examine that same row will also find that row in the cache,
because there will be no need for any actual database traffic to do that.

What you've done here is changed that pattern to be

SearchSysCache(row1)

SearchSysCache(row2)

aclcheck on row1

aclcheck on row2

But that does NOT offer any such guarantee, because the row2 cache lookup
could incur a database access, which might notice that row1 has been
deleted.

I think we may have to adjust the acl.c APIs, or maybe better provide new
entry points, so that we can have variants of pg_xxx_aclcheck that won't
throw a hard error upon not finding the row.  We cheesily tried to avoid
adjusting those APIs to support the semantics we need here, and we now see
that it didn't really work.

regards, tom lane




Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-21 Thread Tom Lane
Andres Freund  writes:
> 1) What kind of consistency do we want from the pg_stats_* views?

That's a hard choice to make.  But let me set the record straight:
when we did the initial implementation, the stats snapshotting behavior
was considered a FEATURE, not an "efficiency hack required by the old
storage model".

If I understand what you are proposing, all stats views would become
completely volatile, without even within-query consistency.  That really
is not gonna work.  As an example, you could get not-even-self-consistent
results from a join to a stats view if the planner decides to implement
it as a nestloop with the view on the inside.

I also believe that the snapshotting behavior has advantages in terms
of being able to perform multiple successive queries and get consistent
results from them.  Only the most trivial sorts of analysis don't need
that.

In short, what you are proposing sounds absolutely disastrous for
usability of the stats views, and I for one will not sign off on it
being acceptable.

I do think we could relax the consistency guarantees a little bit,
perhaps along the lines of only caching view rows that have already
been read, rather than grabbing everything up front.  But we can't
just toss the snapshot concept out the window.  It'd be like deciding
that nobody needs MVCC, or even any sort of repeatable read.

regards, tom lane




Re: "has_column_privilege()" issue with attnums and non-existent columns

2021-03-21 Thread Joe Conway

On 3/16/21 2:45 PM, Joe Conway wrote:

Ian, or anyone else, any comments/complaints on my changes? If not I will commit
and push that version sooner rather than later.


Any thoughts on back-patching this?

On one hand, in my view it is clearly a bug. On the other hand, no one has 
complained before and this does change user visible behavior.


I guess I am leaning toward not back-patching...

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: shared memory stats: high level design decisions: consistency, dropping

2021-03-21 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> I am working on Kyotaro Horiguchi's shared memory stats patch [1] with
> the goal of getting it into a shape that I'd be happy to commit.  That
> thread is quite long and most are probably skipping over new messages in
> it.

Awesome, +1.

> 1) What kind of consistency do we want from the pg_stats_* views?
> 
> Right now the first access to stats in a transaction will trigger a read
> of both the global and per-database stats from disk. If the on-disk
> state is too old, we'll ask the stats collector to write out a new file
> a couple times.
> 
> For the rest of the transaction that in-memory state is used unless
> pg_stat_clear_snapshot() is called. Which means that multiple reads from
> e.g. pg_stat_user_tables will show the same results as before [2].
> 
> That makes stats accesses quite expensive if there are lots of
> objects.
> 
> But it also means that separate stats accesses - which happen all the
> time - return something repeatable and kind of consistent.
> 
> Now, the stats aren't really consistent in the sense that they are
> really accurate, UDP messages can be lost, or only some of the stats
> generated by a TX might not yet have been received, other transactions
> haven't yet sent them. Etc.
> 
> 
> With the shared memory patch the concept of copying all stats for the
> current database into local memory at the time of the first stats access
> doesn't make sense to me.  Horiguchi-san had actually implemented that,
> but I argued that that would be cargo-culting an efficiency hack
> required by the old storage model forward.
> 
> The cost of doing this is substantial. On master, with a database that
> contains 1 million empty tables, any stats access takes ~0.4s and
> increases memory usage by 170MB.
> 
> 
> 1.1)
> 
> I hope everybody agrees with not requiring that stats don't need to be
> the way they were at the time of first stat access in a transaction,
> even if that first access was to a different stat object than the
> currently accessed stat?

Agreed, that doesn't seem necessary and blowing up backend memory usage
by copying all the stats into local memory seems pretty terrible.

> 1.2)
> 
> Do we expect repeated accesses to the same stat to stay the same through
> the transaction?  The easiest way to implement stats accesses is to
> simply fetch the stats from shared memory ([3]). That would obviously
> result in repeated accesses to the same stat potentially returning
> changing results over time.
> 
> I think that's perfectly fine, desirable even, for pg_stat_*.

This seems alright to me.

> 1.3)
> 
> What kind of consistency do we expect between columns of views like
> pg_stat_all_tables?
> 
> Several of the stats views aren't based on SRFs or composite-type
> returning functions, but instead fetch each stat separately:
> 
> E.g. pg_stat_all_tables:
>  SELECT c.oid AS relid,
> n.nspname AS schemaname,
> c.relname,
> pg_stat_get_numscans(c.oid) AS seq_scan,
> pg_stat_get_tuples_returned(c.oid) AS seq_tup_read,
> sum(pg_stat_get_numscans(i.indexrelid))::bigint AS idx_scan,
> sum(pg_stat_get_tuples_fetched(i.indexrelid))::bigint + 
> pg_stat_get_tuples_fetched(c.oid) AS idx_tup_fetch,
> pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins,
> ...
> pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count
>FROM pg_class c
>  LEFT JOIN pg_index i ON c.oid = i.indrelid
> ...
> 
> Which means that if we do not cache stats, additional stats updates
> could have been applied between two stats accessors. E.g the seq_scan
> from before some pgstat_report_stat() but the seq_tup_read from after.
> 
> If we instead fetch all of a table's stats in one go, we would get
> consistency between the columns. But obviously that'd require changing
> all the stats views.
> 
> Horiguchi-san, in later iterations of the patch, attempted to address
> this issue by adding a one-entry caches below
> pgstat_fetch_stat_tabentry(), pgstat_fetch_stat_dbentry() etc, which is
> what pg_stat_get_numscans(), pg_stat_get_db_tuples_updated() etc use.
> 
> 
> But I think that leads to very confusing results. Access stats for the
> same relation multiple times in a row? Do not see updates. Switch
> between e.g. a table and its indexes? See updates.
> 
> 
> I personally think it's fine to have short-term divergences between the
> columns. The stats aren't that accurate anyway, as we don't submit them
> all the time.  And that if we want consistency between columns, we
> instead should replace the current view definitions with[set of] record
> returning function - everything else seems to lead to weird tradeoffs.

Agreed, doesn't seem like a huge issue to have short-term divergences
but if we want to fix them then flipping those all to SRFs would make
the most sense.

> 2) How to remove stats of dropped objects?
> 
> In the stats collector world stats for dropped objects (tables, indexes,
> functions, etc) are dropped after the 

Re: pl/pgsql feature request: shorthand for argument and local variable references

2021-03-21 Thread Pavel Stehule
pá 19. 3. 2021 v 14:14 odesílatel Hannu Krosing  napsal:

> On Thu, Mar 18, 2021 at 4:23 PM Pavel Stehule 
> wrote:
>
> > But we don't support this feature. We are changing just a top scope's
> label. So syntax "ALIAS FOR FUNCTION is not good. The user can have false
> hopes
>
> In this case it looks like it should go together with other labels and
> have << label_here >> syntax ?
>
> And we are back to the question of where to put this top scope label :)
>
> Maybe we cloud still pull in the function arguments into the outermost
> blocks scope, and advise users to have an extra block if they want to
> have same names in both ?
>
>
> CREATE OR REPLACE FUNCTION fx(a int, b int)
> RETURNS ... AS $$
> << fnargs >>
> BEGIN
>   << topblock >>
>   DECLARE
> a int := fnargs.a * 2;
> b int := topblock.a + 2;
>   BEGIN
> 
>   END;
> END;
> $$;
>

It looks pretty messy.


> and we could make the empty outer block optional and treat two
> consecutive labels as top scope label and outermost block label
>
> CREATE OR REPLACE FUNCTION fx(a int, b int)
> RETURNS ... AS $$
> << fnargs >>
> << topblock >>
>   DECLARE
> a int := fnargs.a * 2;
> b int := topblock.a + 2;
>   BEGIN
> 
> END;
> $$;
>
> But I agree that this is also not ideal.
>

I agree with you :). The problem is in semantics - top outer namespace is
external - is not visible, and so placing some label anywhere in code
should to looks scary


>
> And
>
> CREATE OR REPLACE FUNCTION fx(a int, b int)
> WITH (TOPSCOPE_LABEL fnargs)
> RETURNS ... AS $$
> << topblock >>
>   DECLARE
> a int := fnargs.a * 2;
> b int := topblock.a + 2;
>   BEGIN
> 
> END;
> $$;
>

> Is even more inconsistent than #option syntax
>

yes. This syntax has sense. But this syntax should be implemented from zero
(although it will be only a few lines, and then it is not a big issue).

Bigger issue  is a risk of confusion with the "WITH ()" clause of CREATE
TABLE, because syntax is exactly the same, but it holds a list of GUC
settings. And it does exactly what compile options does.

CREATE TABLE foo(...) WITH (autovacuum off)

Please try to compare:

CREATE OR REPLACE FUNCTION fx(a int, b int)
WITH (TOPSCOPE_LABEL fnargs)
RETURNS ... AS $$
<< topblock >>
  DECLARE
a int := fnargs.a * 2;
b int := topblock.a + 2;

CREATE OR REPLACE FUNCTION fx(a int, b int)
RETURNS ... AS $$
#TOPSCOPE_LABEL fnargs
<< topblock >>
  DECLARE
a int := fnargs.a * 2;
b int := topblock.a + 2;

I don't see too big a difference, and with the compile option, I don't need
to introduce new possibly confusing syntax. If we want to implement your
proposed syntax, then we should support all plpgsql compile options there
too (for consistency). The syntax that you propose has logic. But I am
afraid about possible confusion with the same clause with different
semantics of other DDL commands, and then I am not sure if cost is adequate
to benefit.

Regards

Pavel



>
> Cheers
> Hannu
>
> PS:
>
> >
> > For cleanness/orthogonality I would also prefer the blocklables to be in
> DECLARE
> > for each block, but this train has already left :)
> > Though we probably could add alternative syntax ALIAS FOR BLOCK ?
>
> >
> > why? Is it a joke?
> >
> > you are defining a block label, and you want to in the same block
> redefine some outer label? I don't think it is a good idea.
>


Re: recovery_init_sync_method=wal

2021-03-21 Thread Stephen Frost
Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> 2.  You made a file system-level copy of a cluster that you shut down
> cleanly first, using cp, tar, scp, rsync, xmodem etc.  Now you start
> up the copy.  Its checkpoint is a forgery.  (Maybe our manual should
> mention this problem under "25.2. File System Level Backup" where it
> teaches you to rsync your cluster.)

Yes, it'd be good to get some updates to the backup documentation around
this which stresses in all cases that your backup utility should make
sure to fsync everything it restores.

> These are both scatter gun approaches that can sometimes do a lot of
> useless work, and I'd like to find a precise version that uses
> information we already have about what might be dirty according to the
> meaning of a checkpoint and a transaction log.  The attached patch
> does that as follows:
> 
> 1.  Sync the WAL using fsync(), to enforce the log-before-data rule.
> That's moved into the existing loop that scans the WAL files looking
> for temporary files to unlink.  (I suppose it should perhaps do the
> "presync" trick too.  Not done yet.)
> 
> 2.  While replaying the WAL, if we ever decide to skip a page because
> of its LSN, remember to fsync() the file in the next checkpoint anyway
> (because the data might be dirty in the kernel).   This way we sync
> all files that changed since the last checkpoint (even if we don't
> have to apply the change again).  (A more paranoid mode would mark the
> page dirty instead, so that we'll not only fsync() it, but we'll also
> write it out again.  This would defend against kernels that have
> writeback failure modes that include keeping changes but dropping
> their own dirty flag.  Not done here.)

Presuming that we do add to the documentation the language to document
what's assumed (and already done by modern backup tools) that they're
fsync'ing everything they're restoring, do we/can we have an option
which those tools could set that explicitly tells PG "everything in the
cluster has been fsync'd already, you don't need to do anything extra"?
Perhaps also/seperately one for WAL that's restored with restore command
if we think that's necessary?

Otherwise, just in general, agree with doing this to address the risks
discussed around regular crash recovery.  We have some pretty clear "if
the DB was doing recovery and was interrupted, you need to restore from
backup" messages today in xlog.c, and this patch didn't seem to change
that?  Am I missing something or isn't the idea here that these changes
would make it so you aren't going to end up with corruption in those
cases?  Specifically looking at-

xlog.c:6509-
case DB_IN_CRASH_RECOVERY:
ereport(LOG,
(errmsg("database system was interrupted while in recovery 
at %s",
str_time(ControlFile->time)),
 errhint("This probably means that some data is corrupted 
and"
 " you will have to use the last backup for 
recovery.")));
break;

case DB_IN_ARCHIVE_RECOVERY:
ereport(LOG,
(errmsg("database system was interrupted while in recovery 
at log time %s",
str_time(ControlFile->checkPointCopy.time)),
 errhint("If this has occurred more than once some data 
might be corrupted"
 " and you might need to choose an earlier recovery 
target.")));
break;

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Tom Lane
Chapman Flack  writes:
> If this turns out to be a case of "attached the wrong patch, here's
> the one that does implement foo_regex functions!" then I reserve an
> objection to that. :)

+1 to that.  Just to add a note, I do have some ideas about extending
our regex parser so that it could duplicate the XQuery syntax --- none
of the points we mention in 9.7.3.8 seem insurmountable.  I'm not
planning to work on that in the near future, mind you, but I definitely
think that we don't want to paint ourselves into a corner where we've
already implemented the XQuery regex functions with the wrong behavior.

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Chapman Flack
On 03/21/21 09:19, Gilles Darold wrote:
>>> On 2021.03.20. 19:48 Gilles Darold  wrote:
>>>  
>>> This is a new version of the patch that now implements all the XQUERY
>>> regexp functions as described in the standard, minus the differences of
>>> PostgerSQL regular expression explain in [1].
>>>
>>> The standard SQL describe functions like_regex(), occurrences_regex(),
>>> position_regex(), substring_regex() and translate_regex() which
>>> correspond to the commonly named functions regexp_like(),
>>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>>> reported by Chapman Flack in [2]. All these function are implemented in
>>> [v2-0001-xquery-regexp-functions.patch]

I quickly looked over this patch preparing to object if it actually
purported to implement the ISO foo_regex() named functions without
the ISO semantics, but a quick grep reassured me that it doesn't
implement any of those functions. It only supplies functions in
the alternative, apparently common de facto naming scheme regexp_foo().

To be clear, I think that's the right call. I do not think it would be
a good idea to supply functions that have the ISO names but not the
specified regex dialect.

A set of functions analogous to the ISO ones but differently named and
with a different regex dialect seems fine to me, especially if these
different names are de facto common, and as far as I can tell, that is
what this patch provides. So I have no objection to that. :)

It might then be fair to say that the /description/ of the patch as
implementing the XQuery-based foo_regex functions isn't quite right,
or at least carries a risk of jarring some readers into hasty
double-takes on Sunday mornings before coffee.

It might be clearer to just mention the close correspondence between
the functions in this differently-named set and the corresponding ISO ones.

If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)

Regards,
-Chap




Re: invalid data in file backup_label problem on windows

2021-03-21 Thread Magnus Hagander
On Sat, Mar 20, 2021 at 3:10 AM wangsh.f...@fujitsu.com
 wrote:
>
> David Steele  wrote:
>
> > It's not clear to me what text editors have to do with this? Are you
> > editing the file manually?
>
> When I execute SELECT * FROM pg_stop_backup(false, true) in psql.
>
> The results will be shown like:
> lsn|   labelfile  
>  | spcmapfile
> 
> +-+
> 0/2000138 | START WAL LOCATION: 0/228 (file 
> 00010002)+|
>| CHECKPOINT LOCATION: 0/260   
> +|
>| BACKUP METHOD: streamed  
>+|
>| BACKUP FROM: master  
> +
> ..
> The results only will be shown on screen and this function will not generate 
> any files. What I do is write
> the second field(labelfile) to a new file backup_label and write the third 
> field(spcmapfile) to tablespace_map if
> the third field is not null.
>
> Therefore, I choose a text editor to help me write the file.
> I copy the a line in second field and paste this to text editor and press the 
> 'enter' key, repeat this action util
> all the line have be pasted to text editor, then save the file.
>
> If this is not a good way to create the backup_label file, can you tell me 
> how can I create this file on windows?

These APIs are really not designed to be run manually from a CLI and
copy/paste the results.

Running them from literally any script or program should make that
easy, by getting the actual value out and storing it.


> I think the real problem is this file on windows is opened with binary mode. 
> If I use libpq to get the result and write
> the result to file directly(the default action on windows is open file in 
> text mode), this problem will be happened.
> So I consider this is a bug.

No, the problem is you are using copy/paste and in doing so you are
*changing'* the value that is being returned. You'll either need to
update your copy/paste procedure to not mess with the newlines, or to
use a better way to get the data out.

If we need to clarify that in the documentation, I'm fine with that.
Maybe add an extra sentence to the part about not modifying the output
to mention that this includes changing newslines and also encoding
(which would also break it, if you managed to find a non-ascii
compatible encoding). Maybe even something along the line of "the
contents have to be written in binary mode"?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Log message for GSS connection is missing once connection authorization is successful.

2021-03-21 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sat, Mar 20, 2021 at 05:37:47PM +0900, Michael Paquier wrote:
> > It seems to me that this would make the tests faster, that the test
> > would not need to wait for the logging collector and that the code
> > could just use slurp_file($node->logfile) to get the data it wants to
> > check for a given pattern without looking at current_logfiles.  I also
> > think that not using truncate() on the logfile generated has the
> > disadvantage to make the code fuzzy for its verification once we
> > introduce patterns close to each other, as there could easily be an
> > overlap.  That's one problem that SQL pattern checks had to deal with
> > in the past.  Thoughts?
> 
> And, in terms of code, this really simplifies things.  Please see the
> attached that I would like to apply.

Agreed, that does look better/simpler.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: support for MERGE

2021-03-21 Thread Magnus Hagander
On Sun, Mar 21, 2021 at 2:57 PM Alvaro Herrera  wrote:
>
> On 2021-Mar-21, Magnus Hagander wrote:
>
> > On Fri, Mar 19, 2021 at 6:03 PM Bruce Momjian  wrote:
> > >
> > > Let's get someone to go into the "database" and adjust it.  ;-)
> >
> > Yeah, we probably can clean that up on the database side :)
>
> Thank you!
>
> > But what is it we *want* it to do? That is, what should be the target?
> > Is it 2021-07 with status WoA?
>
> Yeah, that sounds like it, thanks.

Done.

The log entry is still there, to show what has been done :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: support for MERGE

2021-03-21 Thread Alvaro Herrera
On 2021-Mar-21, Magnus Hagander wrote:

> On Fri, Mar 19, 2021 at 6:03 PM Bruce Momjian  wrote:
> >
> > Let's get someone to go into the "database" and adjust it.  ;-)
> 
> Yeah, we probably can clean that up on the database side :)

Thank you!

> But what is it we *want* it to do? That is, what should be the target?
> Is it 2021-07 with status WoA?

Yeah, that sounds like it, thanks.

-- 
Álvaro Herrera   Valdivia, Chile




Re: support for MERGE

2021-03-21 Thread Magnus Hagander
On Fri, Mar 19, 2021 at 6:03 PM Bruce Momjian  wrote:
>
> On Fri, Mar 19, 2021 at 10:53:53AM -0400, David Steele wrote:
> > On 3/19/21 10:44 AM, Alvaro Herrera wrote:
> > > On 2021-Mar-19, David Steele wrote:
> > >
> > > > Since it does not appear this is being worked on for PG14 perhaps we 
> > > > should
> > > > close it in this CF and then reopen it when a patch is available?
> > >
> > > No, please move it to the next CF.  Thanks
> >
> > Ugh. I clicked wrong and moved it to the 2021-09 CF. I'm not sure if there's
> > a way to fix it without creating a new entry.
>
> Let's get someone to go into the "database" and adjust it.  ;-)

Yeah, we probably can clean that up on the database side :)

But what is it we *want* it to do? That is, what should be the target?
Is it 2021-07 with status WoA?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 12:07, e...@xs4all.nl a écrit :
>> On 2021.03.20. 19:48 Gilles Darold  wrote:
>>  
>> This is a new version of the patch that now implements all the XQUERY
>> regexp functions as described in the standard, minus the differences of
>> PostgerSQL regular expression explain in [1].
>>
>> The standard SQL describe functions like_regex(), occurrences_regex(),
>> position_regex(), substring_regex() and translate_regex() which
>> correspond to the commonly named functions regexp_like(),
>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>> reported by Chapman Flack in [2]. All these function are implemented in
>> [v2-0001-xquery-regexp-functions.patch]
> Hi,
>
> Apply, compile and (world)check are fine. I haven't found errors in 
> functionality.
>
> I went through the docs, and came up with these changes in func.sgml, and 
> pg_proc.dat.
>
> Useful functions - thanks!
>
> Erik Rijkers


Thanks a lot Erik, here is a version of the patch with your corrections.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b')   o
  substring matching the entire pattern should be inserted.  Write
  \\ if you need to put a literal backslash in the replacement
  text.
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of source by default.  Optional occurrence
+ parameter indicates which occurrence of pattern in
+ source should be replaced.  Default value for occurrence
+ is 1, replace only the first occurrence.  The flags parameter
+ is an optional text string 

  1   2   >