Re: do only critical work during single-user vacuum?

2021-12-20 Thread Masahiko Sawada
On Tue, Dec 21, 2021 at 1:53 PM Peter Geoghegan  wrote:
>
> On Mon, Dec 20, 2021 at 8:40 PM Masahiko Sawada  wrote:
> > BTW a vacuum automatically enters failsafe mode under the situation
> > where the user has to run a vacuum in the single-user mode, right?
>
> Only for the table that had the problem. Maybe there are no other
> tables that a database level "VACUUM" will need to spend much time on,
> or maybe there are, and they will make it take much much longer (it
> all depends).
>
> The goal of the patch is to make sure that when we're in single user
> mode, we'll consistently trigger the failsafe, for every VACUUM
> against every table -- not just the table (or tables) whose
> relfrozenxid is very old. That's still naive, but much less naive than
> simply telling users to VACUUM the whole database in single user mode
> while vacuuming indexes, etc.

I understand the patch, thank you for the explanation!

I remember Simon proposed a VACUUM command option[1], called
FAST_FREEZE, to turn off index cleanup and heap truncation. Now that
we have failsafe mechanism probably we can have a VACUUM command
option to turn on failsafe mode instead.

Regards,

[1] https://commitfest.postgresql.org/32/2908/

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Add id's to various elements in protocol.sgml

2021-12-20 Thread Brar Piening

On 20.12.2021 at 16:09, Robert Haas wrote:

As a data point, this is something I have also wanted to do, from time
to time. I am generally of the opinion that any place the
documentation has a long list of things, which should add ids, so that
people can link to the particular thing in the list to which they want
to draw someone's attention.


Thank you.

If there is consensus on generally adding links to long lists I'd take
suggestions for other places where people think that this would make
sense and amend my patch.






Re: row filtering for logical replication

2021-12-20 Thread Amit Kapila
On Tue, Dec 21, 2021 at 10:53 AM vignesh C  wrote:
>
> On Mon, Dec 20, 2021 at 8:41 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Fri, Dec 17, 2021 6:09 PM Amit Kapila  wrote:
> > > On Fri, Dec 17, 2021 at 4:11 AM Peter Smith  wrote:
> > > >
> > > > PSA the v47* patch set.
> > Thanks for the comments, I agree with all the comments.
> > Attach the V49 patch set, which addressed all the above comments on the 0002
> > patch.
>
> While reviewing the patch, I was testing a scenario where we change
> the row filter condition and refresh the publication, in this case we
> do not identify the row filter change and the table data is not synced
> with the publisher. In case of setting the table, we sync the data
> from the publisher.
>

We only sync data if the table is added after the last Refresh or
Create Subscription. Even if we decide to sync the data again due to
row filter change, it can easily create conflicts with already synced
data. So, this seems expected behavior and we can probably document
it.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-12-20 Thread Amit Kapila
On Tue, Dec 21, 2021 at 6:17 AM Ajin Cherian  wrote:
>
> On Tue, Dec 21, 2021 at 5:58 AM Euler Taveira  wrote:
> >
> >In pgoutput_row_filter_update(), first, we are deforming the tuple in
> >local datum, then modifying the tuple, and then reforming the tuple.
> >I think we can surely do better here. Currently, you are reforming
> >the tuple so that you can store it in the scan slot by calling
> >ExecStoreHeapTuple which will be used for expression evaluation.
> >Instead of that what you need to do is to deform the tuple using
> >tts_values of the scan slot and later call ExecStoreVirtualTuple(), so
> >advantages are 1) you don't need to reform the tuple 2) the expression
> >evaluation machinery doesn't need to deform again for fetching the
> >value of the attribute, instead it can directly get from the value
> >from the virtual tuple.
>
> Storing the old tuple/new tuple in a slot and re-using the slot avoids
> the overhead of
> continuous deforming of tuple at multiple levels in the code.
>

Yeah, deforming tuples again can have a significant cost but what is
the need to maintain tmp_new_tuple in relsyncentry. I think that is
required in rare cases, so we can probably allocate/deallocate when
required.

Few other comments:
==
1.
  TupleTableSlot *scantuple; /* tuple table slot for row filter */
+ TupleTableSlot *new_tuple; /* slot for storing deformed new tuple
during updates */
+ TupleTableSlot *old_tuple; /* slot for storing deformed old tuple
during updates */

I think it is better to name these as scan_slot, new_slot, old_slot to
avoid confusion with tuples.

2.
+++ b/src/backend/replication/logical/proto.c
@@ -19,6 +19,7 @@
 #include "replication/logicalproto.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
+#include "executor/executor.h"

The include is in wrong order. We keep includes in alphabatic order.

3.
@@ -832,6 +847,7 @@ logicalrep_write_tuple(StringInfo out, Relation
rel, HeapTuple tuple, bool binar

  ReleaseSysCache(typtup);
  }
+
 }

Spurious addition.

4.
-logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple,
bool binary)
+logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple,
TupleTableSlot *slot,
+bool binary)

The formatting is quite off. Please run pgindent.

5. If we decide to go with this approach then I feel let's merge the
required comments from Euler's version.



-- 
With Regards,
Amit Kapila.




RE: row filtering for logical replication

2021-12-20 Thread tanghy.f...@fujitsu.com
On Tuesday, December 21, 2021 3:03 PM, tanghy.f...@fujitsu.com 
 wrote:
> To: Amit Kapila ; Euler Taveira 
> Cc: Dilip Kumar ; Peter Smith ;
> Greg Nancarrow ; Hou, Zhijie/侯 志杰
> ; vignesh C ; Ajin Cherian
> ; Rahila Syed ; Peter Eisentraut
> ; Önder Kalacı ;
> japin ; Michael Paquier ; David
> Steele ; Craig Ringer ; Amit
> Langote ; PostgreSQL Hackers
> 
> Subject: RE: row filtering for logical replication
> 
> On Monday, December 20, 2021 4:47 PM tanghy.f...@fujitsu.com
>  wrote:
> > On Monday, December 20, 2021 11:24 AM tanghy.f...@fujitsu.com
> > 
> > >
> > > On Wednesday, December 8, 2021 2:29 PM Amit Kapila
> > >  wrote:
> > > >
> > > > On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira  wrote:
> > > > >
> > > > > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote:
> > > > >
> > > > > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> > > > > >
> > > > > > PS> I will update the commit message in the next version. I barely
> changed
> > > the
> > > > > > documentation to reflect the current behavior. I probably missed 
> > > > > > some
> > > > changes
> > > > > > but I will fix in the next version.
> > > > > >
> > > > > > I realized that I forgot to mention a few things about the UPDATE 
> > > > > > behavior.
> > > > > > Regardless of 0003, we need to define which tuple will be used to 
> > > > > > evaluate
> > > the
> > > > > > row filter for UPDATEs. We already discussed it circa [1]. This 
> > > > > > current
> > version
> > > > > > chooses *new* tuple. Is it the best choice?
> > > > >
> > > > > But with 0003, we are using both the tuple for evaluating the row
> > > > > filter, so instead of fixing 0001, why we don't just merge 0003 with
> > > > > 0001?  I mean eventually, 0003 is doing what is the agreed behavior,
> > > > > i.e. if just OLD is matching the filter then convert the UPDATE to
> > > > > DELETE OTOH if only new is matching the filter then convert the UPDATE
> > > > > to INSERT.  Do you think that even we merge 0001 and 0003 then also
> > > > > there is an open issue regarding which row to select for the filter?
> > > > >
> > > > > Maybe I was not clear. IIUC we are still discussing 0003 and I would
> like
> > to
> > > > > propose a different default based on the conclusion I came up. If we
> merged
> > > > > 0003, that's fine; this change will be useless. If we don't or it is
> optional,
> > > > > it still has its merit.
> > > > >
> > > > > Do we want to pay the overhead to evaluating both tuple for UPDATEs?
> I'm
> > still
> > > > > processing if it is worth it. If you think that in general the row 
> > > > > filter
> > > > > contains the primary key and it is rare to change it, it will waste 
> > > > > cycles
> > > > > evaluating the same expression twice. It seems this behavior could be
> > > > > controlled by a parameter.
> > > > >
> > > >
> > > > I think the first thing we should do in this regard is to evaluate the
> > > > performance for both cases (when we apply a filter to both tuples vs.
> > > > to one of the tuples). In case the performance difference is
> > > > unacceptable, I think it would be better to still compare both tuples
> > > > as default to avoid data inconsistency issues and have an option to
> > > > allow comparing one of the tuples.
> > > >
> > >
> > > I did some performance tests to see if 0003 patch has much overhead.
> > > With which I compared applying first two patches and applying first three
> patches
> > > in four cases:
> > > 1) only old rows match the filter.
> > > 2) only new rows match the filter.
> > > 3) both old rows and new rows match the filter.
> > > 4) neither old rows nor new rows match the filter.
> > >
> > > 0003 patch checks both old rows and new rows, and without 0003 patch, it
> only
> > > checks either old or new rows. We want to know whether it would take more
> time
> > > if we check the old rows.
> > >
> > > I ran the tests in asynchronous mode and compared the SQL execution time.
> I
> > also
> > > tried some complex filters, to see if the difference could be more 
> > > obvious.
> > >
> > > The result and the script are attached.
> > > I didn’t see big difference between the result of applying 0003 patch and
> the
> > > one not in all cases. So I think 0003 patch doesn’t have much overhead.
> > >
> >
> > In previous test, I ran 3 times and took the average value, which may be 
> > affected
> > by
> > performance fluctuations.
> >
> > So, to make the results more accurate, I tested them more times (10 times)
> and
> > took the average value. The result is attached.
> >
> > In general, I can see the time difference is within 3.5%, which is in an 
> > reasonable
> > performance range, I think.
> >
> 
> Hi,
> 
> I ran tests for various percentages of rows being filtered (based on v49 
> patch).
> The result and the script are attached.
> 
> In synchronous mode, with row filter patch, the fewer rows match the row 
> filter,
> the less time it 

RE: row filtering for logical replication

2021-12-20 Thread tanghy.f...@fujitsu.com
On Monday, December 20, 2021 4:47 PM tanghy.f...@fujitsu.com 
 wrote:
> On Monday, December 20, 2021 11:24 AM tanghy.f...@fujitsu.com
> 
> >
> > On Wednesday, December 8, 2021 2:29 PM Amit Kapila
> >  wrote:
> > >
> > > On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira  wrote:
> > > >
> > > > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote:
> > > >
> > > > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira  wrote:
> > > > >
> > > > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> > > > >
> > > > > PS> I will update the commit message in the next version. I barely 
> > > > > changed
> > the
> > > > > documentation to reflect the current behavior. I probably missed some
> > > changes
> > > > > but I will fix in the next version.
> > > > >
> > > > > I realized that I forgot to mention a few things about the UPDATE 
> > > > > behavior.
> > > > > Regardless of 0003, we need to define which tuple will be used to 
> > > > > evaluate
> > the
> > > > > row filter for UPDATEs. We already discussed it circa [1]. This 
> > > > > current
> version
> > > > > chooses *new* tuple. Is it the best choice?
> > > >
> > > > But with 0003, we are using both the tuple for evaluating the row
> > > > filter, so instead of fixing 0001, why we don't just merge 0003 with
> > > > 0001?  I mean eventually, 0003 is doing what is the agreed behavior,
> > > > i.e. if just OLD is matching the filter then convert the UPDATE to
> > > > DELETE OTOH if only new is matching the filter then convert the UPDATE
> > > > to INSERT.  Do you think that even we merge 0001 and 0003 then also
> > > > there is an open issue regarding which row to select for the filter?
> > > >
> > > > Maybe I was not clear. IIUC we are still discussing 0003 and I would 
> > > > like
> to
> > > > propose a different default based on the conclusion I came up. If we 
> > > > merged
> > > > 0003, that's fine; this change will be useless. If we don't or it is 
> > > > optional,
> > > > it still has its merit.
> > > >
> > > > Do we want to pay the overhead to evaluating both tuple for UPDATEs? I'm
> still
> > > > processing if it is worth it. If you think that in general the row 
> > > > filter
> > > > contains the primary key and it is rare to change it, it will waste 
> > > > cycles
> > > > evaluating the same expression twice. It seems this behavior could be
> > > > controlled by a parameter.
> > > >
> > >
> > > I think the first thing we should do in this regard is to evaluate the
> > > performance for both cases (when we apply a filter to both tuples vs.
> > > to one of the tuples). In case the performance difference is
> > > unacceptable, I think it would be better to still compare both tuples
> > > as default to avoid data inconsistency issues and have an option to
> > > allow comparing one of the tuples.
> > >
> >
> > I did some performance tests to see if 0003 patch has much overhead.
> > With which I compared applying first two patches and applying first three 
> > patches
> > in four cases:
> > 1) only old rows match the filter.
> > 2) only new rows match the filter.
> > 3) both old rows and new rows match the filter.
> > 4) neither old rows nor new rows match the filter.
> >
> > 0003 patch checks both old rows and new rows, and without 0003 patch, it 
> > only
> > checks either old or new rows. We want to know whether it would take more 
> > time
> > if we check the old rows.
> >
> > I ran the tests in asynchronous mode and compared the SQL execution time. I
> also
> > tried some complex filters, to see if the difference could be more obvious.
> >
> > The result and the script are attached.
> > I didn’t see big difference between the result of applying 0003 patch and 
> > the
> > one not in all cases. So I think 0003 patch doesn’t have much overhead.
> >
> 
> In previous test, I ran 3 times and took the average value, which may be 
> affected
> by
> performance fluctuations.
> 
> So, to make the results more accurate, I tested them more times (10 times) and
> took the average value. The result is attached.
> 
> In general, I can see the time difference is within 3.5%, which is in an 
> reasonable
> performance range, I think.
> 

Hi,

I ran tests for various percentages of rows being filtered (based on v49 
patch). The result and the script are attached.

In synchronous mode, with row filter patch, the fewer rows match the row 
filter, the less time it took. 
In the case that all rows match the filter, row filter patch took about the 
same time as the one on HEAD code.

In asynchronous mode, I could see time is reduced when the percentage of rows 
sent is small (<25%), other cases took about the same time as the one on HEAD.

I think the above result is good. It shows that row filter patch doesn’t have 
much overhead.

Regards,
Tang


Performance_reports.xlsx
Description: Performance_reports.xlsx


perf_various_percentages.sh
Description: perf_various_percentages.sh


Re: [Proposal] Global temporary tables

2021-12-20 Thread Andrew Bille
Hi!
Thanks for new patches.
Yet another crash reproduced on master with v63 patches:

CREATE TABLESPACE ts LOCATION '/tmp/ts';
CREATE GLOBAL TEMP TABLE tbl (num1 bigint);
INSERT INTO tbl (num1) values (1);
CREATE INDEX tbl_idx ON tbl (num1);
REINDEX (TABLESPACE ts) TABLE tbl;

Got error:
CREATE TABLESPACE
CREATE TABLE
INSERT 0 1
CREATE INDEX
WARNING:  AbortTransaction while in COMMIT state
ERROR:  gtt relfilenode 16388 not found in rel 16388
PANIC:  cannot abort transaction 726, it was already committed
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

in log:
2021-12-21 12:54:08.273 +07 [208725] ERROR:  gtt relfilenode 16388 not
found in rel 16388
2021-12-21 12:54:08.273 +07 [208725] STATEMENT:  REINDEX (TABLESPACE ts)
TABLE tbl;
2021-12-21 12:54:08.273 +07 [208725] WARNING:  AbortTransaction while in
COMMIT state
2021-12-21 12:54:08.273 +07 [208725] PANIC:  cannot abort transaction 726,
it was already committed
2021-12-21 12:54:08.775 +07 [208716] LOG:  server process (PID 208725) was
terminated by signal 6: Аварийный останов
2021-12-21 12:54:08.775 +07 [208716] DETAIL:  Failed process was running:
REINDEX (TABLESPACE ts) TABLE tbl;
2021-12-21 12:54:08.775 +07 [208716] LOG:  terminating any other active
server processes
2021-12-21 12:54:08.775 +07 [208716] LOG:  all server processes terminated;
reinitializing

with dump:
[New LWP 208725]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: andrew postgres [local] REINDEX
   '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: Нет такого файла или каталога.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7feadfac7859 in __GI_abort () at abort.c:79
#2  0x55e36b6d9ec7 in errfinish (filename=0x55e36b786e20 "xact.c",
lineno=1729, funcname=0x55e36b788660 <__func__.29619>
"RecordTransactionAbort") at elog.c:680
#3  0x55e36b0d6e37 in RecordTransactionAbort (isSubXact=false) at
xact.c:1729
#4  0x55e36b0d7f64 in AbortTransaction () at xact.c:2787
#5  0x55e36b0d88fa in AbortCurrentTransaction () at xact.c:3315
#6  0x55e36b524f33 in PostgresMain (dbname=0x55e36d4d97b8 "postgres",
username=0x55e36d4d9798 "andrew") at postgres.c:4252
#7  0x55e36b44d1e0 in BackendRun (port=0x55e36d4d1020) at
postmaster.c:4594
#8  0x55e36b44cac5 in BackendStartup (port=0x55e36d4d1020) at
postmaster.c:4322
#9  0x55e36b448bad in ServerLoop () at postmaster.c:1802
#10 0x55e36b448346 in PostmasterMain (argc=3, argv=0x55e36d4a84d0) at
postmaster.c:1474
#11 0x55e36b33b5ca in main (argc=3, argv=0x55e36d4a84d0) at main.c:198

Regards!

On Mon, Dec 20, 2021 at 7:42 PM wenjing zeng  wrote:

> Post GTT v63 to fixed conflicts with the latest code.
>
>
>
> Hi Andrew
>
> Have you found any new bugs recently?
>
>
>
> Wenjing
>
>
>
>
> 2021年11月20日 01:31,wenjing  写道:
>
>
>
> Andrew Bille  于2021年11月15日周一 下午6:34写道:
>
>> Thanks for the patches. The feature has become much more stable.
>> However, there is another simple case that generates an error:
>> Master with v61 patches
>>
>> CREATE GLOBAL TEMPORARY TABLE t AS SELECT 1 AS a;
>> ERROR:  could not open file "base/13560/t3_16384": No such file or
>> directory
>>
> Thank you for pointing out that this part is not reasonable enough.
> This issue has been fixed in v62.
> Looking forward to your reply.
>
>
> Wenjing
>
>
>
>> Andrew
>>
>> On Thu, Nov 11, 2021 at 3:15 PM wenjing  wrote:
>>
>>> Fixed a bug in function pg_gtt_attached_pid.
>>> Looking forward to your reply.
>>>
>>>
>>> Wenjing
>>>
>>>
>
>
> <0001-gtt-v62-reademe.patch><0004-gtt-v62-regress.patch>
> <0002-gtt-v62-doc.patch><0003-gtt-v62-implementation.patch>
>
>
>
>
>


Re: parallel vacuum comments

2021-12-20 Thread Masahiko Sawada
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila  wrote:
>
> On Tue, Dec 21, 2021 at 10:05 AM Masahiko Sawada  
> wrote:
> >
> > On Tue, Dec 21, 2021 at 12:05 PM Amit Kapila  
> > wrote:
> > >
> > > On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada  
> > > wrote:
> >
> > >  BTW, if we go with that then we should set the correct phase
> > > for workers as well?
> >
> > If we have separate error context for the leader (vacuumlazy.c) and
> > workers (vacuumparallel.c), workers don't necessarily need to have the
> > phases such as  VACUUM_ERRCB_PHASE_VACUUM_INDEX and
> > VACUUM_ERRCB_PHASE_INDEX_CLEANUP. They can use PVIndVacStatus in the
> > error callback function as the patch does.
> >
>
> Okay. One minor point, let's change comments atop vacuum.c considering
> the movement of new functions.

Thank you for the comment. Agreed.

I've attached updated version patches. Please review them.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v10-0001-Move-index-vacuum-routines-to-vacuum.c.patch
Description: Binary data


v10-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patch
Description: Binary data


Re: Multi-Column List Partitioning

2021-12-20 Thread Ashutosh Sharma
On Mon, Dec 20, 2021 at 7:04 PM Amit Langote 
wrote:

> Hi,
>
> On Mon, Dec 13, 2021 at 11:37 PM Ashutosh Sharma 
> wrote:
> >
> > Hi,
> >
> > Is this okay?
> >
> > postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
> > CREATE TABLE
> >
> > postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3),
> (4, 5, 6));
> > CREATE TABLE
> >
> > postgres=# \d t1
> >Partitioned table "public.t1"
> >  Column |  Type   | Collation | Nullable | Default
> > +-+---+--+-
> >  a  | integer |   |  |
> >  b  | integer |   |  |
> > Partition key: LIST (a, a, a)
> > Number of partitions: 1 (Use \d+ to list them.)
>
> I'd say it's not okay for a user to expect this to work sensibly, and
> I don't think it would be worthwhile to write code to point that out
> to the user if that is what you were implying.
>

OK. As you wish.

--
With Regards,
Ashutosh Sharma.


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-20 Thread Ashutosh Sharma
I am getting the below error when running the same test-case that Neha
shared in her previous email.

ERROR:  55000: some relations of database "test1" are already in tablespace
"tab1"
HINT:  You must move them back to the database's default tablespace before
using this command.
LOCATION:  movedb, dbcommands.c:1555

test-case:

create tablespace tab1 location '/home/ashu/test1';
create tablespace tab location '/home/ashu/test';

create database test tablespace tab;
\c test

create table t(a int primary key, b text);

create or replace function large_val() returns text language sql as 'select
array_agg(md5(g::text))::text from generate_series(1, 256) g';

insert into t values (generate_series(1,10), large_val());

alter table t set tablespace tab1 ;

\c postgres
create database test1 template test;

\c test1
alter table t set tablespace tab;

\c postgres
alter database test1 set tablespace tab1; -- this fails with  the given
error.

Observations:
===
Please note that before running above alter database statement, the table
't'  is moved to tablespace 'tab' from 'tab1' so not sure why ReadDir() is
returning true when searching for table 't' in tablespace 'tab1'. It should
have returned NULL here:

 while ((xlde = ReadDir(dstdir, dst_dbpath)) != NULL)
{
if (strcmp(xlde->d_name, ".") == 0 ||
strcmp(xlde->d_name, "..") == 0)
continue;

ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("some relations of database \"%s\" are already
in tablespace \"%s\"",
dbname, tblspcname),
 errhint("You must move them back to the database's
default tablespace before using this command.")));
}

Also, if I run the checkpoint explicitly before executing the above alter
database statement, this error doesn't appear which means it only happens
with the new changes because earlier we were doing the force checkpoint at
the end of createdb statement.

--
With Regards,
Ashutosh Sharma.

On Thu, Dec 16, 2021 at 9:26 PM Neha Sharma 
wrote:

> Hi,
>
> While testing the v8 patches in a hot-standby setup, it was observed the
> master is crashing with the below error;
>
> 2021-12-16 19:32:47.757 +04 [101483] PANIC:  could not fsync file
> "pg_tblspc/16385/PG_15_202112111/16386/16391": No such file or directory
> 2021-12-16 19:32:48.917 +04 [101482] LOG:  checkpointer process (PID
> 101483) was terminated by signal 6: Aborted
>
> Parameters configured at master:
> wal_level = hot_standby
> max_wal_senders = 3
> hot_standby = on
> max_standby_streaming_delay= -1
> wal_consistency_checking='all'
> max_wal_size= 10GB
> checkpoint_timeout= 1d
> log_min_messages=debug1
>
> Test Case:
> create tablespace tab1 location
> '/home/edb/PGsources/postgresql/inst/bin/test1';
> create tablespace tab location
> '/home/edb/PGsources/postgresql/inst/bin/test';
> create database test tablespace tab;
> \c test
> create table t( a int PRIMARY KEY,b text);
> CREATE OR REPLACE FUNCTION large_val() RETURNS TEXT LANGUAGE SQL AS
> 'select array_agg(md5(g::text))::text from generate_series(1, 256) g';
> insert into t values (generate_series(1,10), large_val());
> alter table t set tablespace tab1 ;
> \c postgres
> create database test1 template test;
> \c test1
> alter table t set tablespace tab;
> \c postgres
> alter database test1 set tablespace tab1;
>
> --cancel the below command
> alter database test1 set tablespace pg_default; --press ctrl+c
> \c test1
> alter table t set tablespace tab1;
>
>
> Log file attached for reference.
>
> Thanks.
> --
> Regards,
> Neha Sharma
>
>
> On Thu, Dec 16, 2021 at 4:17 PM Dilip Kumar  wrote:
>
>> On Thu, Dec 16, 2021 at 12:15 AM Bruce Momjian  wrote:
>> >
>> > On Thu, Dec  2, 2021 at 07:19:50PM +0530, Dilip Kumar wrote:
>> > From the patch:
>> >
>> > > Currently, CREATE DATABASE forces a checkpoint, then copies all the
>> files,
>> > > then forces another checkpoint. The comments in the createdb()
>> function
>> > > explain the reasons for this. The attached patch fixes this problem
>> by making
>> > > create database completely WAL logged so that we can avoid the
>> checkpoints.
>> > >
>> > > This can also be useful for supporting the TDE. For example, if we
>> need different
>> > > encryption for the source and the target database then we can not
>> re-encrypt the
>> > > page data if we copy the whole directory.  But with this patch, we
>> are copying
>> > > page by page so we have an opportunity to re-encrypt the page before
>> copying that
>> > > to the target database.
>> >
>> > Uh, why is this true?  Why can't we just copy the heap/index files 8k at
>> > a time and reencrypt them during the file copy, rather than using shared
>> > buffers?
>>
>> Hi Bruce,
>>
>> Yeah, you are right that if we copy in 8k block then we can re-encrypt
>> the page, but in the current system, we are not copying block by
>> block.  So the 

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2021-12-20 Thread Peter Geoghegan
On Mon, Dec 20, 2021 at 8:29 PM Masahiko Sawada  wrote:
> > Can we fully get rid of vacuum_freeze_table_age?
>
> Does it mean that a vacuum always is an aggressive vacuum?

No. Just somewhat more like one. Still no waiting for cleanup locks,
though. Also, autovacuum is still cancelable (that's technically from
anti-wraparound VACUUM, but you know what I mean). And there shouldn't
be a noticeable difference in terms of how many blocks can be skipped
using the VM.

> If opportunistic freezing works well on all tables, we might no longer
> need vacuum_freeze_table_age. But I’m not sure that’s true since the
> cost of freezing tuples is not 0.

That's true, of course, but right now the only goal of opportunistic
freezing is to advance relfrozenxid in every VACUUM. It needs to be
shown to be worth it, of course. But let's assume that it is worth it,
for a moment (perhaps only because we optimize freezing itself in
passing) -- then there is little use for vacuum_freeze_table_age, that
I can see.

> I think that that's true for (mostly) static tables. But regarding
> constantly-updated tables, since autovacuum runs based on the number
> of garbage tuples (or inserted tuples) and how old the relfrozenxid is
> if an autovacuum could not advance the relfrozenxid because it could
> not get a cleanup lock on the page that has the single oldest XID,
> it's likely that when autovacuum runs next time it will have to
> process other pages too since the page will get dirty enough.

I'm not arguing that the age of the single oldest XID is *totally*
irrelevant. Just that it's typically much less important than the
total amount of work we'd have to do (freezing) to be able to advance
relfrozenxid.

In any case, the extreme case where we just cannot get a cleanup lock
on one particular page with an old XID is probably very rare.

> It might be a good idea that we remember pages where we could not get
> a cleanup lock somewhere and revisit them after index cleanup. While
> revisiting the pages, we don’t prune the page but only freeze tuples.

Maybe, but I think that it would make more sense to not use
FreezeLimit for that at all. In an aggressive VACUUM (where we might
actually have to wait for a cleanup lock), why should we wait once the
age is over vacuum_freeze_min_age (usually 50 million XIDs)? The
official answer is "because we need to advance relfrozenxid". But why
not accept a much older relfrozenxid that is still sufficiently
young/safe, in order to avoid waiting for a cleanup lock?

In other words, what if our approach of "being diligent about
advancing relfrozenxid" makes the relfrozenxid problem worse, not
better? The problem with "being diligent" is that it is defined by
FreezeLimit (which is more or less the same thing as
vacuum_freeze_min_age), which is supposed to be about which tuples we
will freeze. That's a very different thing to how old relfrozenxid
should be or can be (after an aggressive VACUUM finishes).

> > On the other hand, the risk may be far greater if we have *many*
> > tuples that are still unfrozen, whose XIDs are only "middle aged"
> > right now. The idea behind vacuum_freeze_min_age seems to be to be
> > lazy about work (tuple freezing) in the hope that we'll never need to
> > do it, but that seems obsolete now. (It probably made a little more
> > sense before the visibility map.)
>
> Why is it obsolete now? I guess that it's still valid depending on the
> cases, for example, heavily updated tables.

Because after the 9.6 freezemap work we'll often set the all-visible
bit in the VM, but not the all-frozen bit (unless we have the
opportunistic freezing patch applied, which specifically avoids that).
When that happens, affected heap pages will still have
older-than-vacuum_freeze_min_age-XIDs after VACUUM runs, until we get
to an aggressive VACUUM. There could be many VACUUMs before the
aggressive VACUUM.

This "freezing cliff" seems like it might be a big problem, in
general. That's what I'm trying to address here.

Either way, the system doesn't really respect vacuum_freeze_min_age in
the way that it did before 9.6 -- which is what I meant by "obsolete".

> I don’t have an objection to increasing autovacuum_freeze_max_age for
> now. One of my concerns with anti-wraparound vacuums is that too many
> tables (or several large tables) will reach autovacuum_freeze_max_age
> at once, using up autovacuum slots and preventing autovacuums from
> being launched on tables that are heavily being updated.

I think that the patch helps with that, actually -- there tends to be
"natural variation" in the relfrozenxid age of each table, which comes
from per-table workload characteristics.

> Given these
> works, expanding the gap between vacuum_freeze_table_age and
> autovacuum_freeze_max_age would have better chances for the tables to
> advance its relfrozenxid by an aggressive vacuum instead of an
> anti-wraparound-aggressive vacuum. 400 million seems to be a good
> start.

The idea behind getting rid of 

Re: row filtering for logical replication

2021-12-20 Thread vignesh C
On Mon, Dec 20, 2021 at 8:41 AM houzj.f...@fujitsu.com
 wrote:
>
> On Fri, Dec 17, 2021 6:09 PM Amit Kapila  wrote:
> > On Fri, Dec 17, 2021 at 4:11 AM Peter Smith  wrote:
> > >
> > > PSA the v47* patch set.
> Thanks for the comments, I agree with all the comments.
> Attach the V49 patch set, which addressed all the above comments on the 0002
> patch.

While reviewing the patch, I was testing a scenario where we change
the row filter condition and refresh the publication, in this case we
do not identify the row filter change and the table data is not synced
with the publisher. In case of setting the table, we sync the data
from the publisher. I'm not sure if the behavior is right or not.
Publisher session(setup publication):
-
create table t1(c1 int);
insert into t1 values(11);
insert into t1 values(12);
insert into t1 values(1);
select * from t1;
c1

11
12
  1
(3 rows)
create publication pub1 for table t1 where ( c1 > 10);

Subscriber session(setup subscription):
-
create table t1(c1 int);
create subscription sub1 connection 'dbname=postgres host=localhost'
publication pub1;
select * from t1;
c1

11
12
(2 rows)

Publisher session(alter the row filter condition):
-
alter publication pub1 set table t1 where ( c1 < 10);

Subscriber session(Refresh):
-
alter subscription sub1 refresh publication ; -- After refresh, c1
with 1 is not fetched
select * from t1;
c1

11
12
(2 rows)

Should we do a table sync in this case, or should the user handle this
scenario to take care of sync data from the publisher or should we
throw an error to avoid confusion. If existing behavior is fine, we
can document it.

Thoughts?

Regards,
Vignesh




Re: BufferAlloc: don't take two simultaneous locks

2021-12-20 Thread Yura Sokolov
В Сб, 02/10/2021 в 01:25 +0300, Yura Sokolov пишет:
> Good day.
> 
> I found some opportunity in Buffer Manager code in BufferAlloc
> function:
> - When valid buffer is evicted, BufferAlloc acquires two partition
> lwlocks: for partition for evicted block is in and partition for new
> block placement.
> 
> It doesn't matter if there is small number of concurrent replacements.
> But if there are a lot of concurrent backends replacing buffers,
> complex dependency net quickly arose.
> 
> It could be easily seen with select-only pgbench with scale 100 and
> shared buffers 128MB: scale 100 produces 1.5GB tables, and it certainly
> doesn't fit shared buffers. This way performance starts to degrade at
> ~100 connections. Even with shared buffers 1GB it slowly degrades after
> 150 connections. 
> 
> But strictly speaking, there is no need to hold both lock
> simultaneously. Buffer is pinned so other processes could not select it
> for eviction. If tag is cleared and buffer removed from old partition
> then other processes will not find it. Therefore it is safe to release
> old partition lock before acquiring new partition lock.
> 
> If other process concurrently inserts same new buffer, then old buffer
> is placed to bufmanager's freelist.
> 
> Additional optimisation: in case of old buffer is reused, there is no
> need to put its BufferLookupEnt into dynahash's freelist. That reduces
> lock contention a bit more. To acomplish this FreeListData.nentries is
> changed to pg_atomic_u32/pg_atomic_u64 and atomic increment/decrement
> is used.
> 
> Remark: there were bug in the `hash_update_hash_key`: nentries were not
> kept in sync if freelist partitions differ. This bug were never
> triggered because single use of `hash_update_hash_key` doesn't move
> entry between partitions.
> 
> There is some tests results.
> 
> - pgbench with scale 100 were tested with --select-only (since we want
> to test buffer manager alone). It produces 1.5GB table.
> - two shared_buffers values were tested: 128MB and 1GB.
> - second best result were taken among five runs
> 
> Test were made in three system configurations:
> - notebook with i7-1165G7 (limited to 2.8GHz to not overheat)
> - Xeon X5675 6 core 2 socket NUMA system (12 cores/24 threads).
> - same Xeon X5675 but restricted to single socket
>   (with numactl -m 0 -N 0)
> 
> Results for i7-1165G7:
> 
>   conns | master |patched |  master 1G | patched 1G 
> ++++
>   1 |  29667 |  29079 |  29425 |  29411 
>   2 |  55577 |  3 |  57974 |  57223 
>   3 |  87393 |  87924 |  87246 |  89210 
>   5 | 136222 | 136879 | 133775 | 133949 
>   7 | 179865 | 176734 | 178297 | 175559 
>  17 | 215953 | 214708 | 222908 | 223651 
>  27 | 211162 | 213014 | 220506 | 219752 
>  53 | 211620 | 218702 | 220906 | 225218 
>  83 | 213488 | 221799 | 219075 | 228096 
> 107 | 212018 | 222110 | 222502 | 227825 
> 139 | 207068 | 220812 | 218191 | 226712 
> 163 | 203716 | 220793 | 213498 | 226493 
> 191 | 199248 | 217486 | 210994 | 221026 
> 211 | 195887 | 217356 | 209601 | 219397 
> 239 | 193133 | 215695 | 209023 | 218773 
> 271 | 190686 | 213668 | 207181 | 219137 
> 307 | 188066 | 214120 | 205392 | 218782 
> 353 | 185449 | 213570 | 202120 | 217786 
> 397 | 182173 | 212168 | 201285 | 216489 
> 
> Results for 1 socket X5675
> 
>   conns | master |patched |  master 1G | patched 1G 
> ++++
>   1 |  16864 |  16584 |  17419 |  17630 
>   2 |  32764 |  32735 |  34593 |  34000 
>   3 |  47258 |  46022 |  49570 |  47432 
>   5 |  64487 |  64929 |  68369 |  68885 
>   7 |  81932 |  82034 |  87543 |  87538 
>  17 | 114502 | 114218 | 127347 | 127448 
>  27 | 116030 | 115758 | 130003 | 128890 
>  53 | 116814 | 117197 | 131142 | 131080 
>  83 | 114438 | 116704 | 130198 | 130985 
> 107 | 113255 | 116910 | 129932 | 131468 
> 139 | 111577 | 116929 | 129012 | 131782 
> 163 | 110477 | 116818 | 128628 | 131697 
> 191 | 109237 | 116672 | 127833 | 131586 
> 211 | 108248 | 116396 | 127474 | 131650 
> 239 | 107443 | 116237 | 126731 | 131760 
> 271 | 106434 | 115813 | 126009 | 131526 
> 307 | 105077 | 115542 | 125279 | 131421 
> 353 | 104516 | 115277 | 124491 | 131276 
> 397 |  

Re: parallel vacuum comments

2021-12-20 Thread Amit Kapila
On Tue, Dec 21, 2021 at 10:05 AM Masahiko Sawada  wrote:
>
> On Tue, Dec 21, 2021 at 12:05 PM Amit Kapila  wrote:
> >
> > On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada  
> > wrote:
>
> >  BTW, if we go with that then we should set the correct phase
> > for workers as well?
>
> If we have separate error context for the leader (vacuumlazy.c) and
> workers (vacuumparallel.c), workers don't necessarily need to have the
> phases such as  VACUUM_ERRCB_PHASE_VACUUM_INDEX and
> VACUUM_ERRCB_PHASE_INDEX_CLEANUP. They can use PVIndVacStatus in the
> error callback function as the patch does.
>

Okay. One minor point, let's change comments atop vacuum.c considering
the movement of new functions.


-- 
With Regards,
Amit Kapila.




Re: Getting rid of regression test input/ and output/ files

2021-12-20 Thread Greg Stark
On Sun, 19 Dec 2021 at 18:41, Corey Huinker  wrote:
>
> Which brings up a tangential question, is there value in having something 
> that brings in one or more env vars as psql vars directly. I'm thinking 
> something like:
>
> \importenv pattern [prefix]

Oof. That gives me the security heebie jeebies. Off the top of my head
PHP, CGI, SSH have all dealt with vulnerabilities caused by
accidentally importing variables they didn't intend to.

-- 
greg




Re: do only critical work during single-user vacuum?

2021-12-20 Thread Peter Geoghegan
On Mon, Dec 20, 2021 at 8:40 PM Masahiko Sawada  wrote:
> BTW a vacuum automatically enters failsafe mode under the situation
> where the user has to run a vacuum in the single-user mode, right?

Only for the table that had the problem. Maybe there are no other
tables that a database level "VACUUM" will need to spend much time on,
or maybe there are, and they will make it take much much longer (it
all depends).

The goal of the patch is to make sure that when we're in single user
mode, we'll consistently trigger the failsafe, for every VACUUM
against every table -- not just the table (or tables) whose
relfrozenxid is very old. That's still naive, but much less naive than
simply telling users to VACUUM the whole database in single user mode
while vacuuming indexes, etc.


--
Peter Geoghegan




Re: do only critical work during single-user vacuum?

2021-12-20 Thread Masahiko Sawada
On Tue, Dec 21, 2021 at 12:46 PM Andres Freund  wrote:
>
> Hi,
>
> On 2021-12-20 17:17:26 -0800, Peter Geoghegan wrote:
> > On Thu, Dec 9, 2021 at 8:41 PM Bossart, Nathan  wrote:
> > > I like the idea of having a built-in function that does the bare
> > > minimum to resolve wraparound emergencies, and I think providing some
> > > sort of simple progress indicator (even if rudimentary) would be very
> > > useful.
> >
> > If John doesn't have time to work on this during the Postgres 15
> > cycle, and if nobody else picks it up, then we should at least do the
> > bare minimum here: force the use of the failsafe in single user mode
> > (regardless of the age of relfrozenxid/relminmxid, which in general
> > might not be that old in tables where VACUUM might need to do a lot of
> > work). Attached quick and dirty patch shows what this would take. If
> > nothing else, it seems natural to define running any VACUUM in single
> > user mode as an emergency.
>
> As I said before I think this is a bad idea. I'm fine with adding a vacuum
> parameter forcing failsafe mode. And perhaps a hint to suggest it in single
> user mode. But forcing it is a bad idea - single user isn't just used for
> emergencies (c.f. initdb, which this patch would regress) and not every
> emergency making single user mode useful is related to wraparound.

+1

BTW a vacuum automatically enters failsafe mode under the situation
where the user has to run a vacuum in the single-user mode, right?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: parallel vacuum comments

2021-12-20 Thread Masahiko Sawada
On Tue, Dec 21, 2021 at 12:05 PM Amit Kapila  wrote:
>
> On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada  wrote:
> >
> > On Mon, Dec 20, 2021 at 1:08 PM Amit Kapila  wrote:
> > >
> > > > >
> > > > > 2. What is the reason for not moving
> > > > > lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
> > > > > can be called from vacuumlazy.c and vacuumparallel.c? Without this
> > > > > refactoring patch, I think both leader and workers set the same error
> > > > > context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
> > > > > vacuuming? Is it because you want a separate context phase for a
> > > > > parallel vacuum?
> > > >
> > > > Since the phases defined as VacErrPhase like
> > > > VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
> > > > and error callback function, vacuum_error_callback(), are specific to
> > > > heap, I thought it'd not be a good idea to move
> > > > lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
> > > > vacuumparallel.c can use the phases and error callback function.
> > > >
> > >
> > > How about exposing it via heapam.h? We have already exposed a few
> > > things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
> > > proposal, we need to have separate callbacks and phases for index
> > > vacuuming so that it can be used by both vacuumlazy.c and
> > > vacuumparallel.c which might not be a good idea.
> >
> > Yeah, but if we expose VacErrPhase and vacuum_error_callback(), we
> > need to also expose LVRelState and vacuumparallel.c also uses it,
> > which seems not a good idea. So we will need to introduce a new struct
> > dedicated to the error callback function. Is that right?
> >
>
> Right, but that also doesn't sound good to me.

Me too.

>  I think it is better to
> keep a separate error context for parallel vacuum workers as you have
> in the patch. However, let's add some comments to indicate that if
> there is a change in one of the context functions, the other should be
> changed.

Agreed.

>  BTW, if we go with that then we should set the correct phase
> for workers as well?

If we have separate error context for the leader (vacuumlazy.c) and
workers (vacuumparallel.c), workers don't necessarily need to have the
phases such as  VACUUM_ERRCB_PHASE_VACUUM_INDEX and
VACUUM_ERRCB_PHASE_INDEX_CLEANUP. They can use PVIndVacStatus in the
error callback function as the patch does.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2021-12-20 Thread Masahiko Sawada
On Sat, Dec 18, 2021 at 11:29 AM Peter Geoghegan  wrote:
>
> On Thu, Dec 16, 2021 at 10:46 PM Masahiko Sawada  
> wrote:
> > > My emphasis here has been on making non-aggressive VACUUMs *always*
> > > advance relfrozenxid, outside of certain obvious edge cases. And so
> > > with all the patches applied, up to and including the opportunistic
> > > freezing patch, every autovacuum of every table manages to advance
> > > relfrozenxid during benchmarking -- usually to a fairly recent value.
> > > I've focussed on making aggressive VACUUMs (especially anti-wraparound
> > > autovacuums) a rare occurrence, for truly exceptional cases (e.g.,
> > > user keeps canceling autovacuums, maybe due to automated script that
> > > performs DDL). That has taken priority over other goals, for now.
> >
> > Great!
>
> Maybe this is a good time to revisit basic questions about VACUUM. I
> wonder if we can get rid of some of the GUCs for VACUUM now.
>
> Can we fully get rid of vacuum_freeze_table_age?

Does it mean that a vacuum always is an aggressive vacuum? If
opportunistic freezing works well on all tables, we might no longer
need vacuum_freeze_table_age. But I’m not sure that’s true since the
cost of freezing tuples is not 0.

> We probably shouldn't be using any units, but using XIDs "feels wrong"
> to me. Even with my patch, it is theoretically possible that we won't
> be able to advance relfrozenxid very much, because we cannot get a
> cleanup lock on one single heap page with one old XID. But even in
> this extreme case, how relevant is the "age" of this old XID, really?
> What really matters is whether or not we can advance relfrozenxid in
> time (with time to spare). And so the wraparound risk of the system is
> not affected all that much by the age of the single oldest XID. The
> risk mostly comes from how much total work we still need to do to
> advance relfrozenxid. If the single old XID is quite old indeed (~1.5
> billion XIDs), but there is only one, then we just have to freeze one
> tuple to be able to safely advance relfrozenxid (maybe advance it by a
> huge amount!). How long can it take to freeze one tuple, with the
> freeze map, etc?

I think that that's true for (mostly) static tables. But regarding
constantly-updated tables, since autovacuum runs based on the number
of garbage tuples (or inserted tuples) and how old the relfrozenxid is
if an autovacuum could not advance the relfrozenxid because it could
not get a cleanup lock on the page that has the single oldest XID,
it's likely that when autovacuum runs next time it will have to
process other pages too since the page will get dirty enough.

It might be a good idea that we remember pages where we could not get
a cleanup lock somewhere and revisit them after index cleanup. While
revisiting the pages, we don’t prune the page but only freeze tuples.

>
> On the other hand, the risk may be far greater if we have *many*
> tuples that are still unfrozen, whose XIDs are only "middle aged"
> right now. The idea behind vacuum_freeze_min_age seems to be to be
> lazy about work (tuple freezing) in the hope that we'll never need to
> do it, but that seems obsolete now. (It probably made a little more
> sense before the visibility map.)

Why is it obsolete now? I guess that it's still valid depending on the
cases, for example, heavily updated tables.

>
> Using XIDs makes sense for things like autovacuum_freeze_max_age,
> because there we have to worry about wraparound and relfrozenxid
> (whether or not we like it). But with this patch, and with everything
> else (the failsafe, insert-driven autovacuums, everything we've done
> over the last several years) I think that it might be time to increase
> the autovacuum_freeze_max_age default. Maybe even to something as high
> as 800 million transaction IDs, but certainly to 400 million. What do
> you think? (Maybe don't answer just yet, something to think about.)

I don’t have an objection to increasing autovacuum_freeze_max_age for
now. One of my concerns with anti-wraparound vacuums is that too many
tables (or several large tables) will reach autovacuum_freeze_max_age
at once, using up autovacuum slots and preventing autovacuums from
being launched on tables that are heavily being updated. Given these
works, expanding the gap between vacuum_freeze_table_age and
autovacuum_freeze_max_age would have better chances for the tables to
advance its relfrozenxid by an aggressive vacuum instead of an
anti-wraparound-aggressive vacuum. 400 million seems to be a good
start.

>
> > +   vacrel->aggressive = aggressive;
> > vacrel->failsafe_active = false;
> > vacrel->consider_bypass_optimization = true;
> >
> > How about adding skipwithvm to LVRelState too?
>
> Agreed -- it's slightly better that way. Will change this.
>
> >  */
> > -   if (skipping_blocks && !FORCE_CHECK_PAGE())
> > +   if (skipping_blocks && blkno < nblocks - 1)
> >
> > Why do we 

Re: row filtering for logical replication

2021-12-20 Thread Amit Kapila
On Tue, Dec 21, 2021 at 12:28 AM Euler Taveira  wrote:
>
> On Mon, Dec 20, 2021, at 12:10 AM, houzj.f...@fujitsu.com wrote:
>
> Attach the V49 patch set, which addressed all the above comments on the 0002
> patch.
>
> I've been testing the latest versions of this patch set. I'm attaching a new
> patch set based on v49. The suggested fixes are in separate patches after the
> current one so it is easier to integrate them into the related patch. The
> majority of these changes explains some decision to improve readability IMO.
>
> row-filter x row filter. I'm not a native speaker but "row filter" is widely
> used in similar contexts so I suggest to use it. (I didn't adjust the commit
> messages)
>
> An ancient patch use the term coerce but it was changed to cast. Coercion
> implies an implicit conversion [1]. If you look at a few lines above you will
> see that this expression expects an implicit conversion.
>
> I modified the query to obtain the row filter expressions to (i) add the 
> schema
> pg_catalog to some objects and (ii) use NOT EXISTS instead of subquery (it
> reads better IMO).
>

Yeah, I think that reads better, but maybe we can once check the plan
of both queries and see if there is any significant difference between
one of those.

> A detail message requires you to capitalize the first word of sentences and
> includes a period at the end.
>
> It seems all server messages and documentation use the terminology "WHERE
> clause". Let's adopt it instead of "row filter".
>
> I reviewed 0003. It uses TupleTableSlot instead of HeapTuple. I probably 
> missed
> the explanation but it requires more changes (logicalrep_write_tuple and 3 new
> entries into RelationSyncEntry). I replaced this patch with a slightly
> different one (0005 in this patch set) that uses HeapTuple instead. I didn't
> only simple tests and it requires tests. I noticed that this patch does not
> include a test to cover the case where TOASTed values are not included in the
> new tuple. We should probably add one.
>

Yeah, it would be good to add such a test.

-- 
With Regards,
Amit Kapila.




FW: Question about HEAP_XMIN_COMMITTED

2021-12-20 Thread haiming
Hello, hackers.
I have the same question. 
Hope to reply, thanks.

-Original Message-
From: liuhuail...@fujitsu.com  
Sent: Tuesday, December 14, 2021 4:55 PM
To: pgsql-hack...@postgresql.org
Subject: Question about HEAP_XMIN_COMMITTED

Hi

I did the following steps on PG.

1. Building a synchronous streaming replication environment.
2. Executing the following SQL statements on primary
  (1) postgres=# CREATE EXTENSION pageinspect;
  (2) postgres=# begin;
  (3) postgres=# select txid_current();
  (4) postgres=# create table mytest6(i int);
  (6) postgres=# insert into mytest6 values(1);
  (7) postgres=# commit;
3. Executing the following SQL statements on standby
  (8) postgres=# select * from mytest6;
 i 
---
 1
  (1 row)
  (9) postgres=# SELECT t_infomask FROM 
heap_page_items(get_raw_page('pg_class', 0)) where t_xmin=502※;
     t_infomask 
   
2049
  (1 row)
  ※502 is the transaction ID returned by step (3) above.

In the result of step (9),the value of the t_infomask field is 2049(0x801) 
which means that HEAP_XMAX_INVALID and HEAP_HASNULL flags were setted, but 
HEAP_XMIN_COMMITTED flag was not setted.

According to source , when step (8) was executed,SetHintBits function were 
called to set HEAP_XMIN_COMMITTED.
however, the minRecoveryPoint value was not updated. So HEAP_XMIN_COMMITTED 
flag was not setted successfully.

After CheckPoint, select from mytest6 again in another session, we can see 
HEAP_XMIN_COMMITTED flag was setted.

So my question is that before checkpoint, HEAP_XMIN_COMMITTED flag was not 
setted correctly, right?

Or we need to move minRecoveryPoint forword to make HEAP_XMIN_COMMITTED flag 
setted correctly when first select from mytest6.


Best Regards, LiuHuailing
--
以上
Liu Huailing
--
Liu Huailing
Development Department III
Software Division II
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
ADDR.: No.6 Wenzhu Road, Software Avenue,
   Nanjing, 210012, China
TEL  : +86+25-86630566-8439
COINS: 7998-8439
FAX  : +86+25-83317685
MAIL : liuhuail...@cn.fujitsu.com
--






 

Re: do only critical work during single-user vacuum?

2021-12-20 Thread Andres Freund
Hi,

On 2021-12-20 17:17:26 -0800, Peter Geoghegan wrote:
> On Thu, Dec 9, 2021 at 8:41 PM Bossart, Nathan  wrote:
> > I like the idea of having a built-in function that does the bare
> > minimum to resolve wraparound emergencies, and I think providing some
> > sort of simple progress indicator (even if rudimentary) would be very
> > useful.
> 
> If John doesn't have time to work on this during the Postgres 15
> cycle, and if nobody else picks it up, then we should at least do the
> bare minimum here: force the use of the failsafe in single user mode
> (regardless of the age of relfrozenxid/relminmxid, which in general
> might not be that old in tables where VACUUM might need to do a lot of
> work). Attached quick and dirty patch shows what this would take. If
> nothing else, it seems natural to define running any VACUUM in single
> user mode as an emergency.

As I said before I think this is a bad idea. I'm fine with adding a vacuum
parameter forcing failsafe mode. And perhaps a hint to suggest it in single
user mode. But forcing it is a bad idea - single user isn't just used for
emergencies (c.f. initdb, which this patch would regress) and not every
emergency making single user mode useful is related to wraparound.

Greetings,

Andres Freund




Re: pg_upgrade should truncate/remove its logs before running

2021-12-20 Thread Justin Pryzby
On Mon, Dec 20, 2021 at 08:21:51PM +0900, Michael Paquier wrote:
> On Fri, Dec 17, 2021 at 11:21:13AM -0600, Justin Pryzby wrote:

> +   log_opts.basedir = "pg_upgrade_log.d";

> we could choose something simpler for the default, like
> "pg_upgrade_log".  I don't have a good history in naming new things,
> though :)

I specifically called it .d to made it obvious that it's a dir - nearly
everything that ends in "log" is a file, so people are likely to run "rm" and
"less" on it - including myself.

> .gitignore should be updated, I guess?

Are you suggesting to remove these ?
-/pg_upgrade_internal.log
-/reindex_hash.sql
-/loadable_libraries.txt

> Besides, this patch has no documentation.

TBH I'm not even sure if the dir needs to be configurable ?
>From 447cc989018a95871209f5305308a384d6a5fb9e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 11 Dec 2021 19:19:50 -0600
Subject: [PATCH] pg_upgrade: write log files and dumps in subdir..

The subdir must not exist when pg_upgrade is started.
This avoids appending to logfiles which have pre-existing errors in them.
And allows more easily cleaning up after pg_upgrade.
---
 doc/src/sgml/ref/pgupgrade.sgml | 13 +++
 src/bin/pg_upgrade/.gitignore   |  3 --
 src/bin/pg_upgrade/check.c  | 10 ++---
 src/bin/pg_upgrade/dump.c   |  8 ++--
 src/bin/pg_upgrade/exec.c   |  5 ++-
 src/bin/pg_upgrade/function.c   |  3 +-
 src/bin/pg_upgrade/option.c | 23 ++--
 src/bin/pg_upgrade/pg_upgrade.c | 65 +
 src/bin/pg_upgrade/pg_upgrade.h |  1 +
 src/bin/pg_upgrade/server.c |  4 +-
 10 files changed, 92 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index c5ce732ee98..887297317f9 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -130,6 +130,19 @@ PostgreSQL documentation
   cluster
  
 
+ 
+  -l
+  --logdir
+  Directory where SQL and log files will be written.
+  The directory will be created and must not exist before calling
+  pg_upgrade.
+  It will be removed after a successful upgrade unless
+  --retain is specified.
+  The default is pg_upgrade_log.d in the current
+  working directory.
+  
+ 
+
  
   -N
   --no-sync
diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 2d3bfeaa502..7222f8a6b1f 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -1,9 +1,6 @@
 /pg_upgrade
 # Generated by test suite
-/pg_upgrade_internal.log
 /delete_old_cluster.sh
 /delete_old_cluster.bat
-/reindex_hash.sql
-/loadable_libraries.txt
 /log/
 /tmp_check/
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 1b8ef79242b..9bc4e106dd3 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -552,7 +552,7 @@ create_script_for_old_cluster_deletion(char **deletion_script_file_name)
 
 	prep_status("Creating script to delete old cluster");
 
-	if ((script = fopen_priv(*deletion_script_file_name, "w")) == NULL)
+	if ((script = fopen_priv(*deletion_script_file_name, "w")) == NULL) // OK ?
 		pg_fatal("could not open file \"%s\": %s\n",
  *deletion_script_file_name, strerror(errno));
 
@@ -784,7 +784,7 @@ check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster)
 		return;
 	}
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s", log_opts.basedir,
 			 "contrib_isn_and_int8_pass_by_value.txt");
 
 	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
@@ -861,7 +861,7 @@ check_for_user_defined_postfix_ops(ClusterInfo *cluster)
 
 	prep_status("Checking for user-defined postfix operators");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s", log_opts.basedir,
 			 "postfix_ops.txt");
 
 	/* Find any user defined postfix operators */
@@ -960,7 +960,7 @@ check_for_tables_with_oids(ClusterInfo *cluster)
 
 	prep_status("Checking for tables WITH OIDS");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s", log_opts.basedir,
 			 "tables_with_oids.txt");
 
 	/* Find any tables declared WITH OIDS */
@@ -1215,7 +1215,7 @@ check_for_user_defined_encoding_conversions(ClusterInfo *cluster)
 
 	prep_status("Checking for user-defined encoding conversions");
 
-	snprintf(output_path, sizeof(output_path),
+	snprintf(output_path, sizeof(output_path), "%s/%s", log_opts.basedir,
 			 "encoding_conversions.txt");
 
 	/* Find any user defined encoding conversions */
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 90060d0f8eb..fe979e0df48 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -22,10 +22,10 @@ generate_old_dump(void)
 	/* run new pg_dumpall binary for globals */
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
-	

Re: parallel vacuum comments

2021-12-20 Thread Amit Kapila
On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada  wrote:
>
> On Mon, Dec 20, 2021 at 1:08 PM Amit Kapila  wrote:
> >
> > > >
> > > > 2. What is the reason for not moving
> > > > lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
> > > > can be called from vacuumlazy.c and vacuumparallel.c? Without this
> > > > refactoring patch, I think both leader and workers set the same error
> > > > context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
> > > > vacuuming? Is it because you want a separate context phase for a
> > > > parallel vacuum?
> > >
> > > Since the phases defined as VacErrPhase like
> > > VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
> > > and error callback function, vacuum_error_callback(), are specific to
> > > heap, I thought it'd not be a good idea to move
> > > lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
> > > vacuumparallel.c can use the phases and error callback function.
> > >
> >
> > How about exposing it via heapam.h? We have already exposed a few
> > things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
> > proposal, we need to have separate callbacks and phases for index
> > vacuuming so that it can be used by both vacuumlazy.c and
> > vacuumparallel.c which might not be a good idea.
>
> Yeah, but if we expose VacErrPhase and vacuum_error_callback(), we
> need to also expose LVRelState and vacuumparallel.c also uses it,
> which seems not a good idea. So we will need to introduce a new struct
> dedicated to the error callback function. Is that right?
>

Right, but that also doesn't sound good to me. I think it is better to
keep a separate error context for parallel vacuum workers as you have
in the patch. However, let's add some comments to indicate that if
there is a change in one of the context functions, the other should be
changed. BTW, if we go with that then we should set the correct phase
for workers as well?

-- 
With Regards,
Amit Kapila.




Re: sequences vs. synchronous replication

2021-12-20 Thread Tomas Vondra

On 12/21/21 02:01, Tom Lane wrote:

Tomas Vondra  writes:

OK, I did a quick test with two very simple benchmarks - simple select
from a sequence, and 'pgbench -N' on scale 1. Benchmark was on current
master, patched means SEQ_LOG_VALS was set to 1.


But ... pgbench -N doesn't use sequences at all, does it?

Probably inserts into a table with a serial column would constitute a
plausible real-world case.



D'oh! For some reason I thought pgbench has a sequence on the history 
table, but clearly I was mistaken. There's another thinko, because after 
inspecting pg_waldump output I realized "SEQ_LOG_VALS 1" actually logs 
only every 2nd increment. So it should be "SEQ_LOG_VALS 0".


So I repeated the test fixing SEQ_LOG_VALS, and doing the pgbench with a 
table like this:


  create table test (a serial, b int);

and a script doing

  insert into test (b) values (1);

The results look like this:

1) select nextval('s');

 clients  1 4
--
 master   39533124998
 patched   3748  9114
--
 diff  -91%  -93%


2) insert into test (b) values (1);

 clients  1 4
--
 master3718  9188
 patched   3698  9209
--
 diff0%0%

So the nextval() results are a bit worse, due to not caching 1/2 the 
nextval calls. The -90% is roughly expected, due to generating about 32x 
more WAL (and having to wait for commit).


But results for the more realistic insert workload are about the same as 
before (i.e. no measurable difference). Also kinda expected, because 
those transactions have to wait for WAL anyway.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: simplifying foreign key/RI checks

2021-12-20 Thread Corey Huinker
>
>
>
> Good catch, thanks.  Patch updated.
>
>
>
Applies clean. Passes check-world.


Re: PublicationActions - use bit flags.

2021-12-20 Thread Greg Nancarrow
On Tue, Dec 21, 2021 at 11:56 AM Tom Lane  wrote:
>
> Removing this is not good:
>
> if (relation->rd_pubactions)
> -   {
> pfree(relation->rd_pubactions);
> -   relation->rd_pubactions = NULL;
> -   }
>
> If the subsequent palloc fails, you've created a problem where
> there was none before.
>

Oops, yeah, I got carried away; if palloc() failed and called exit(),
then it would end up crashing when trying to use/pfree rd_pubactions
again.
Better leave that line in ...

> I do wonder why we have to palloc a constant-size substructure in
> the first place, especially one that is likely smaller than the
> pointer that points to it.  Maybe the struct definition should be
> moved so that we can just declare it in-line in the relcache entry?
>

I think currently it's effectively using the rd_pubactions pointer as
a boolean flag to indicate whether the publication membership info has
been fetched (so the bool flags are valid).
I guess you'd need another bool flag if you wanted to make that struct
in-line in the relcache entry.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-20 Thread Kyotaro Horiguchi
At Fri, 17 Dec 2021 11:25:22 +0900, Shinya Kato  
wrote in 
> On 2021-12-17 01:55, Fujii Masao wrote:
> > I'm still not sure why you were thinking that ERROR is more proper
> > here.
> 
> Since I get an ERROR when I set the wrong normal GUCs, I thought I
> should also get an ERROR when I set the wrong extension's GUCs.
> However, there are likely to be harmful effects, and I may need to
> think again.

The GUC placeholder mechanism is evidently designed so that server
allows to define variables unknown to the server before loading
extension modules.  ERRORing out that variables at the timing is
contradicting the objective for the placeholder mechanism.

Addition to that EmitWarningsOnPlaceholders()'s objective is to warn
for variables that shouldn't be of a *known* namespace.  However, the
patch is intending to check out variable definitions of all namespaces
that are seen in configuration file, but it is undeterminable at that
time whether each of the namespaces is either just wrong or unknown
yet.  And the latter is, as mentioned above, what we are intending to
allow.

As the concusion, I think we cannot rule-out "wrong" namespaces unless
we find any means to distinguish whether a namespace is wrong or
not-yet-defined before loading extension modules.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Confused comment about drop replica identity index

2021-12-20 Thread wangw.f...@fujitsu.com
On Tue, Dec 20, 2021 at 19:11PM, Michael Paquier wrote:
> That's mostly fine.  I have made some adjustments as per the attached.

Thanks for reviewing.


> + The default for non-system tables. Records the old values of the 
> columns
> + of the primary key, if any. The default for non-system tables.
> The same sentence is repeated twice.
> 
> + Records no information about the old row.(This is the
> default for system tables.)
> For consistency with the rest, this could drop the parenthesis for the second
> sentence.
> 
> +   USING INDEX index_name
> This should use  as markup for index_name.

The change looks good to me.



Regards,
Wang wei




Re: do only critical work during single-user vacuum?

2021-12-20 Thread Peter Geoghegan
On Thu, Dec 9, 2021 at 8:41 PM Bossart, Nathan  wrote:
> I like the idea of having a built-in function that does the bare
> minimum to resolve wraparound emergencies, and I think providing some
> sort of simple progress indicator (even if rudimentary) would be very
> useful.

If John doesn't have time to work on this during the Postgres 15
cycle, and if nobody else picks it up, then we should at least do the
bare minimum here: force the use of the failsafe in single user mode
(regardless of the age of relfrozenxid/relminmxid, which in general
might not be that old in tables where VACUUM might need to do a lot of
work). Attached quick and dirty patch shows what this would take. If
nothing else, it seems natural to define running any VACUUM in single
user mode as an emergency.

This is really the least we could do -- it's much better than nothing,
but still really lazy^M^M^M^M conservative. I haven't revised the
assumption that the user should do a top-level "VACUUM" in databases
that can no longer allocate XIDs due to wraparound, despite the fact
that we could do far better with moderate effort. Although it might
make sense to commit something like the attached as part of a more
worked out solution (assuming it didn't fully remove single user mode
from the equation, which would be better still).

-- 
Peter Geoghegan


v1-0001-Always-trigger-failsafe-in-single-user-mode.patch
Description: Binary data


Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-20 Thread Kyotaro Horiguchi
At Mon, 20 Dec 2021 21:05:23 +0900, Shinya Kato  
wrote in 
> - v6-01-Add-EmitWarningsOnPlaceholders.patch
> We should use EmitWarningsOnPlaceholders when we use
> DefineCustomXXXVariable.
> I don't think there is any room for debate.

Unfortunately, pltcl.c defines variables both in pltcl and pltclu but
the patch adds the warning only for pltclu.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: sequences vs. synchronous replication

2021-12-20 Thread Tom Lane
Tomas Vondra  writes:
> OK, I did a quick test with two very simple benchmarks - simple select 
> from a sequence, and 'pgbench -N' on scale 1. Benchmark was on current 
> master, patched means SEQ_LOG_VALS was set to 1.

But ... pgbench -N doesn't use sequences at all, does it?

Probably inserts into a table with a serial column would constitute a
plausible real-world case.

regards, tom lane




Re: PublicationActions - use bit flags.

2021-12-20 Thread Tom Lane
Greg Nancarrow  writes:
> I've attached a patch which addresses that and replaces a couple of
> memcpy()s with struct assignment, as suggested.

Removing this is not good:

if (relation->rd_pubactions)
-   {
pfree(relation->rd_pubactions);
-   relation->rd_pubactions = NULL;
-   }
 
If the subsequent palloc fails, you've created a problem where
there was none before.

I do wonder why we have to palloc a constant-size substructure in
the first place, especially one that is likely smaller than the
pointer that points to it.  Maybe the struct definition should be
moved so that we can just declare it in-line in the relcache entry?

regards, tom lane




Re: sequences vs. synchronous replication

2021-12-20 Thread Tomas Vondra

On 12/20/21 17:40, Tomas Vondra wrote:

On 12/20/21 15:31, Peter Eisentraut wrote:

On 18.12.21 22:48, Tomas Vondra wrote:
What do you mean by "not caching unused sequence numbers"? Reducing 
SEQ_LOG_VALS to 1, i.e. WAL-logging every sequence increment?


That'd work, but I wonder how significant the impact will be. It'd 
bet it hurts the patch adding logical decoding of sequences quite a bit.


It might be worth testing.  This behavior is ancient and has never 
really been scrutinized since it was added.




OK, I'll do some testing to measure the overhead, and I'll see how much 
it affects the sequence decoding patch.




OK, I did a quick test with two very simple benchmarks - simple select 
from a sequence, and 'pgbench -N' on scale 1. Benchmark was on current 
master, patched means SEQ_LOG_VALS was set to 1.


Average of 10 runs, each 30 seconds long, look like this:

1) select nextval('s');

 clients  1 4
--
 master   39497123137
 patched   6813 18326
--
 diff  -83%  -86%

2) pgbench -N

 clients  1 4
--
 master2935  9156
 patched   2937  9100
--
 diff0%0%


Clearly the extreme case (1) is hit pretty bad, while the much mure 
likely workload (2) is almost unaffected.



I'm not sure what conclusion to make from this, but assuming almost no 
one does just nextval calls, it should be acceptable.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: row filtering for logical replication

2021-12-20 Thread Ajin Cherian
On Tue, Dec 21, 2021 at 5:58 AM Euler Taveira  wrote:
>
> I reviewed 0003. It uses TupleTableSlot instead of HeapTuple. I probably 
> missed
> the explanation but it requires more changes (logicalrep_write_tuple and 3 new
> entries into RelationSyncEntry). I replaced this patch with a slightly
> different one (0005 in this patch set) that uses HeapTuple instead. I didn't
> only simple tests and it requires tests. I noticed that this patch does not
> include a test to cover the case where TOASTed values are not included in the
> new tuple. We should probably add one.

The reason I changed the code to use virtualtuple slots is to reduce
tuple deforming overhead.
Dilip raised this very valid comment in [1]:

On Tue, Sep 21, 2021 at 4:29 PM Dilip Kumar
 wrote:
>
>In pgoutput_row_filter_update(), first, we are deforming the tuple in
>local datum, then modifying the tuple, and then reforming the tuple.
>I think we can surely do better here. Currently, you are reforming
>the tuple so that you can store it in the scan slot by calling
>ExecStoreHeapTuple which will be used for expression evaluation.
>Instead of that what you need to do is to deform the tuple using
>tts_values of the scan slot and later call ExecStoreVirtualTuple(), so
>advantages are 1) you don't need to reform the tuple 2) the expression
>evaluation machinery doesn't need to deform again for fetching the
>value of the attribute, instead it can directly get from the value
>from the virtual tuple.

Storing the old tuple/new tuple in a slot and re-using the slot avoids
the overhead of
continuous deforming of tuple at multiple levels in the code.

regards,
Ajin Cherian
[1] - 
https://www.postgresql.org/message-id/CAFiTN-vwBjy+eR+iodkO5UVN5cPv_xx1=s8ehzgcrjza+az...@mail.gmail.com




Re: Confused comment about drop replica identity index

2021-12-20 Thread Michael Paquier
On Mon, Dec 20, 2021 at 11:57:32AM -0300, Euler Taveira wrote:
> What do you think about the attached patch? It forbids the DROP INDEX. We 
> might
> add a detail message but I didn't in this patch.

Yeah.  I'd agree about doing something like that on HEAD, and that
would help with some of the logirep-related patch currently being
worked on, as far as I understood.
--
Michael


signature.asc
Description: PGP signature


Re: Refactoring of compression options in pg_basebackup

2021-12-20 Thread Michael Paquier
On Mon, Dec 20, 2021 at 10:19:44AM -0500, Robert Haas wrote:
> One thing we should keep in mind is that I'm also working on adding
> server-side compression, initially with gzip, but Jeevan Ladhe has
> posted patches to extend that to LZ4. So however we structure the
> options they should take that into account. Currently that patch set
> adds --server-compression={none,gzip,gzipN} where N from 1 to 9, but
> perhaps it should be done differently. Not sure.

Yeah, consistency would be good.  For the client-side compression of
LZ4, we have shaped things around the existing --compress option, and
there is 6f164e6 that offers an API to parse that at option-level,
meaning less custom error strings.  I'd like to think that splitting
the compression level and the compression method is still the right
choice, except if --server-compression combined with a client-side
compression is a legal combination.  This would not really make sense,
though, no?  So we'd better block this possibility from the start?

>> One more thing that I have noticed while hacking this stuff is that we
>> have no regression tests for gzip with pg_basebackup, so I have added
>> some that are skipped when not compiling the code with ZLIB.
> 
> If they don't decompress the backup and run pg_verifybackup on it then
> I'm not sure how much they help. Yet, I don't know how to do that
> portably.

They help in checking that an environment does not use a buggy set of
GZIP, at least.  Using pg_verifybackup on a base backup with
--format='t' could be tweaked with $ENV{TAR} for the tarballs
generation, for example, as we do in some other tests.  Option sets
like "xvf" or "zxvf" should be rather portable across the buildfarm,
no?  I'd like to think that this is not a requirement for adding
checks in the compression path, as a first step, though, but I agree
that it could be extended more.
--
Michael


signature.asc
Description: PGP signature


Re: PublicationActions - use bit flags.

2021-12-20 Thread Greg Nancarrow
On Tue, Dec 21, 2021 at 4:14 AM Alvaro Herrera  wrote:
>
> On 2021-Dec-20, Peter Eisentraut wrote:
>
> > I don't see why this is better.  It just makes the code longer and adds more
> > punctuation and reduces type safety.
>
> Removing one palloc is I think the most important consequence ...
> probably not a big deal though.
>
> I think we could change the memcpy calls to struct assignment, as that
> would look a bit cleaner, and call it a day.
>

I think we can all agree that returning PublicationActions as a
palloc'd struct is unnecessary.
I've attached a patch which addresses that and replaces a couple of
memcpy()s with struct assignment, as suggested.


Regards,
Greg Nancarrow
Fujitsu Australia


get_rel_pubactions_improvement.patch
Description: Binary data


Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Tom Lane
Here's a less hasty version of the patch.

regards, tom lane

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..5d4700430c 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2201,6 +2201,26 @@ expression_tree_walker(Node *node,
 	return true;
 			}
 			break;
+		case T_PartitionBoundSpec:
+			{
+PartitionBoundSpec *pbs = (PartitionBoundSpec *) node;
+
+if (walker(pbs->listdatums, context))
+	return true;
+if (walker(pbs->lowerdatums, context))
+	return true;
+if (walker(pbs->upperdatums, context))
+	return true;
+			}
+			break;
+		case T_PartitionRangeDatum:
+			{
+PartitionRangeDatum *prd = (PartitionRangeDatum *) node;
+
+if (walker(prd->value, context))
+	return true;
+			}
+			break;
 		case T_List:
 			foreach(temp, (List *) node)
 			{
@@ -3092,6 +3112,28 @@ expression_tree_mutator(Node *node,
 return (Node *) newnode;
 			}
 			break;
+		case T_PartitionBoundSpec:
+			{
+PartitionBoundSpec *pbs = (PartitionBoundSpec *) node;
+PartitionBoundSpec *newnode;
+
+FLATCOPY(newnode, pbs, PartitionBoundSpec);
+MUTATE(newnode->listdatums, pbs->listdatums, List *);
+MUTATE(newnode->lowerdatums, pbs->lowerdatums, List *);
+MUTATE(newnode->upperdatums, pbs->upperdatums, List *);
+return (Node *) newnode;
+			}
+			break;
+		case T_PartitionRangeDatum:
+			{
+PartitionRangeDatum *prd = (PartitionRangeDatum *) node;
+PartitionRangeDatum *newnode;
+
+FLATCOPY(newnode, prd, PartitionRangeDatum);
+MUTATE(newnode->value, prd->value, Node *);
+return (Node *) newnode;
+			}
+			break;
 		case T_List:
 			{
 /*
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index e57c3aa124..9d5b8b9ab9 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -88,6 +88,9 @@ static Relids alias_relid_set(Query *query, Relids relids);
  *		Create a set of all the distinct varnos present in a parsetree.
  *		Only varnos that reference level-zero rtable entries are considered.
  *
+ * "root" can be passed as NULL if it is not necessary to process
+ * PlaceHolderVars.
+ *
  * NOTE: this is used on not-yet-planned expressions.  It may therefore find
  * bare SubLinks, and if so it needs to recurse into them to look for uplevel
  * references to the desired rtable level!	But when we find a completed
@@ -168,9 +171,13 @@ pull_varnos_walker(Node *node, pull_varnos_context *context)
 		/*
 		 * If a PlaceHolderVar is not of the target query level, ignore it,
 		 * instead recursing into its expression to see if it contains any
-		 * vars that are of the target level.
+		 * vars that are of the target level.  We'll also do that when the
+		 * caller doesn't pass a "root" pointer.  (We probably shouldn't see
+		 * PlaceHolderVars at all in such cases, but if we do, this is a
+		 * reasonable behavior.)
 		 */
-		if (phv->phlevelsup == context->sublevels_up)
+		if (phv->phlevelsup == context->sublevels_up &&
+			context->root != NULL)
 		{
 			/*
 			 * Ideally, the PHV's contribution to context->varnos is its
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 8da525c715..f73ce7ccb5 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2561,6 +2564,12 @@ decompile_column_index_array(Datum column_index_array, Oid relId,
  * the one specified by the second parameter.  This is sufficient for
  * partial indexes, column default expressions, etc.  We also support
  * Var-free expressions, for which the OID can be InvalidOid.
+ *
+ * We expect this function to work, or throw a reasonably clean error,
+ * for any node tree that can appear in a catalog pg_node_tree column.
+ * Query trees, such as those appearing in pg_rewrite.ev_action, are
+ * excluded.  So are expressions in more than one relation, which can
+ * appear in places like pg_rewrite.ev_qual.
  * --
  */
 Datum
@@ -2622,6 +2631,8 @@ static text *
 pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
 {
 	Node	   *node;
+	Node	   *tst;
+	Relids		relids;
 	List	   *context;
 	char	   *exprstr;
 	char	   *str;
@@ -2634,6 +2645,40 @@ pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
 
 	pfree(exprstr);
 
+	/*
+	 * Throw error if the input is a querytree rather than an expression tree.
+	 * While we could support queries here, there seems no very good reason
+	 * to.  In most such catalog columns, we'll see a List of Query nodes, or
+	 * even nested Lists, so drill down to a non-List node before checking.
+	 */
+	tst = node;
+	while (tst && IsA(tst, List))
+		tst = linitial((List *) tst);
+	if (tst && IsA(tst, Query))
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input is a query, not an expression")));
+
+	/*
+	 * Throw error if the expression contains Vars we won't 

Re: Schema variables - new implementation for Postgres 15

2021-12-20 Thread Justin Pryzby
I don't understand what 0002 patch does relative to the 0001 patch.
Is 0002 to change the error messages from "schema variables" to "session
variables" , in a separate commit to show that the main patch doesn't change
regression results ?  Could you add commit messages ?

I mentioned before that there's a pre-existing use of the phrase "session
variable", which you should change to something else:

origin:doc/src/sgml/ref/set_role.sgml:   SET ROLE does not 
process session variables as specified by
origin:doc/src/sgml/ref/set_role.sgml-   the role's ALTER ROLE settings;  this 
only happens during
origin:doc/src/sgml/ref/set_role.sgml-   login.

Maybe "session variable" should be added to the glossary.

The new tests crash if debug_discard_caches=on.

2021-12-20 16:15:44.476 CST postmaster[7478] LOG:  server process (PID 7657) 
was terminated by signal 6: Aborted
2021-12-20 16:15:44.476 CST postmaster[7478] DETAIL:  Failed process was 
running: DISCARD VARIABLES;

TRAP: FailedAssertion("sessionvars", File: "sessionvariable.c", Line: 270, PID: 
7657)

#2  0x564858a4f1a8 in ExceptionalCondition 
(conditionName=conditionName@entry=0x564858b8626d "sessionvars", 
errorType=errorType@entry=0x564858aa700b "FailedAssertion", 
fileName=fileName@entry=0x564858b86234 "sessionvariable.c", 
lineNumber=lineNumber@entry=270) at assert.c:69
#3  0x56485874fec6 in sync_sessionvars_xact_callback (event=, arg=) at sessionvariable.c:270
#4  sync_sessionvars_xact_callback (event=, arg=) 
at sessionvariable.c:253
#5  0x56485868030a in CallXactCallbacks (event=XACT_EVENT_PRE_COMMIT) at 
xact.c:3644
#6  CommitTransaction () at xact.c:2178
#7  0x564858681975 in CommitTransactionCommand () at xact.c:3043
#8  0x56485892b7a9 in finish_xact_command () at postgres.c:2722
#9  0x56485892dc5b in finish_xact_command () at postgres.c:2720
#10 exec_simple_query () at postgres.c:1240
#11 0x56485892f70a in PostgresMain () at postgres.c:4498
#12 0x56485889a479 in BackendRun (port=, port=) at postmaster.c:4594
#13 BackendStartup (port=) at postmaster.c:4322
#14 ServerLoop () at postmaster.c:1802
#15 0x56485889b47c in PostmasterMain () at postmaster.c:1474
#16 0x5648585c60c0 in main (argc=5, argv=0x564858e553f0) at main.c:198




Re: relcache not invalidated when ADD PRIMARY KEY USING INDEX

2021-12-20 Thread Euler Taveira
On Mon, Dec 20, 2021, at 3:45 AM, houzj.f...@fujitsu.com wrote:
> Hi,
> 
> When reviewing some replica identity related patches, I found that when adding
> primary key using an existing unique index on not null columns, the
> target table's relcache won't be invalidated.
Good catch.

It seems you can simplify your checking indexForm->indisprimary directly, no?

Why did you add new tests for test_decoding? I think the TAP tests alone are
fine. BTW, this test is similar to publisher3/subscriber3. Isn't it better to
use the same pub/sub to reduce the test execution time?


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: row filtering for logical replication

2021-12-20 Thread Peter Smith
On Tue, Dec 21, 2021 at 5:58 AM Euler Taveira  wrote:
>
> On Mon, Dec 20, 2021, at 12:10 AM, houzj.f...@fujitsu.com wrote:
>
> Attach the V49 patch set, which addressed all the above comments on the 0002
> patch.
>
> I've been testing the latest versions of this patch set. I'm attaching a new
> patch set based on v49. The suggested fixes are in separate patches after the
> current one so it is easier to integrate them into the related patch. The
> majority of these changes explains some decision to improve readability IMO.
>
> row-filter x row filter. I'm not a native speaker but "row filter" is widely
> used in similar contexts so I suggest to use it. (I didn't adjust the commit
> messages)
>
> An ancient patch use the term coerce but it was changed to cast. Coercion
> implies an implicit conversion [1]. If you look at a few lines above you will
> see that this expression expects an implicit conversion.
>
> I modified the query to obtain the row filter expressions to (i) add the 
> schema
> pg_catalog to some objects and (ii) use NOT EXISTS instead of subquery (it
> reads better IMO).
>
> A detail message requires you to capitalize the first word of sentences and
> includes a period at the end.
>
> It seems all server messages and documentation use the terminology "WHERE
> clause". Let's adopt it instead of "row filter".
>
> I reviewed 0003. It uses TupleTableSlot instead of HeapTuple. I probably 
> missed
> the explanation but it requires more changes (logicalrep_write_tuple and 3 new
> entries into RelationSyncEntry). I replaced this patch with a slightly
> different one (0005 in this patch set) that uses HeapTuple instead. I didn't
> only simple tests and it requires tests. I noticed that this patch does not
> include a test to cover the case where TOASTed values are not included in the
> new tuple. We should probably add one.
>
> I agree with Amit that it is a good idea to merge 0001, 0002, and 0005. I 
> would
> probably merge 0004 because it is just isolated changes.
>
> [1] https://en.wikipedia.org/wiki/Type_conversion
>
>

Thanks for all the suggested fixes.

Next, I plan to post a new v51* patch set which will be

1. Take your "fixes"  patches, and wherever possible just merge them
back into the main patches.
2. Merge the resulting main patches according to Amit's advice.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Tom Lane
Robert Haas  writes:
> OK ... but my point is that dump and restore does work. So whatever
> cases pg_get_expr() doesn't work must be cases that aren't needed for
> that to happen. Otherwise this problem would have been found long ago.

pg_get_expr doesn't (or didn't) depend on expression_tree_walker,
so there wasn't a problem there before.  I am worried that there
might be other code paths, now or in future, that could try to apply
expression_tree_walker/mutator to relpartbound trees, which is
why I think it's a reasonable idea to teach them about such trees.

>> It does fall over if you try to apply it to stored rules:
>> regression=# select pg_get_expr(ev_action, 0) from pg_rewrite;
>> ERROR:  unrecognized node type: 232
>> I'm not terribly excited about that, but maybe we should try to
>> improve it while we're here.

> In my view, the lack of excitement about sanity checks in functions
> that deal with node trees in the catalogs is the root of this problem.

It's only a problem if you hold the opinion that there should be
no user-reachable ERRCODE_INTERNAL_ERROR errors.  Which is a fine
ideal, but I fear we're a pretty long way off from that.

> I realize that's a deep hole out of which we're unlikely to be able to
> climb in the short or even medium term, but we don't have to keep
> digging. We either make a rule that pg_get_expr() can apply to
> everything stored in the catalogs and produce sensible answers, which
> seems to be what you prefer, or we make it return nice errors for the
> cases that it can't handle nicely, or some combination of the two. And
> whatever we decide, we also document and enforce everywhere.

I think having pg_get_expr throw an error for a query, as opposed to an
expression, is fine.  What I don't want to do is subdivide things a lot
more finely than that; thus lumping "relpartbound" into "expression"
seems like a reasonable thing to do.  Especially since we already did it
six years ago.

In a quick check of catalogs with pg_node_tree columns, I find these
other columns that pg_get_expr can fail on (at least with the
examples available in the regression DB):

regression=# select count(pg_get_expr(prosqlbody,0)) from pg_proc;
ERROR:  unrecognized node type: 232
regression=# select count(pg_get_expr(tgqual,tgrelid)) from pg_trigger ;
ERROR:  bogus varno: 2

So that looks like the same cases we already knew about: input is
a querytree not an expression, or it contains Vars for more than
one relation.

regards, tom lane




Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-20 Thread Corey Huinker
>
> Out of curiosity, could you please tell me the concrete situations
> where you wanted to delete one of two identical records?
>

In my case, there is a table with known duplicates, and we would like to
delete all but the one with the lowest ctid, and then add a unique index to
the table which then allows us to use INSERT ON CONFLICT in a meaningful
way.

The other need for a DELETE...LIMIT or UPDATE...LIMIT is when you're
worried about flooding a replica, so you parcel out the DML into chunks
that don't cause unacceptable lag on the replica.

Both of these can be accomplished via  DELETE FROM foo WHERE ctid IN (
SELECT ... FROM foo ... LIMIT 1000), but until recently such a construct
would result in a full table scan, and you'd take the same hit with each
subsequent DML.

I *believe* that the ctid range scan now can limit those scans, especially
if you can order the limited set by ctid, but those techniques are not
widely known at this time.


Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Robert Haas
On Mon, Dec 20, 2021 at 2:36 PM Tom Lane  wrote:
> I'm not sure why you're astonished by that, considering that
> psql has applied pg_get_expr to relpartbound since f0e44751d,
> which was the same commit that put code into ruleutils.c to
> make pg_get_expr work on relpartbounds.
>
> It seems a bit late to change our minds on this; and anyway,
> if pg_get_expr didn't handle them, we'd just need to invent
> another function that did.

OK.

> > Alternatively, maybe pg_get_expr() should just fail and tell you that
> > this is not an expression, and if you want to see what's in that
> > column, you should use the SQL-callable functions specifically
> > provided for that purpose (pg_get_partkeydef, I think).
>
> pg_get_partkeydef does something different.
>
> regression=# select pg_get_expr(relpartbound,oid) from pg_class where relname 
> = 'beta_neg';
>pg_get_expr
> --
>  FOR VALUES FROM ('-10') TO ('0')
> (1 row)
>
> regression=# select pg_get_partkeydef('beta_neg'::regclass);
>  pg_get_partkeydef
> ---
>  RANGE (b)
> (1 row)

OK ... but my point is that dump and restore does work. So whatever
cases pg_get_expr() doesn't work must be cases that aren't needed for
that to happen. Otherwise this problem would have been found long ago.

> > I don't know
> > why it should be legitimate for pg_get_expr() to just assume that any
> > random node tree it gets handed must be an expression without doing
> > any sanity checking.
>
> It does fall over if you try to apply it to stored rules:
>
> regression=# select pg_get_expr(ev_action, 0) from pg_rewrite;
> ERROR:  unrecognized node type: 232
>
> I'm not terribly excited about that, but maybe we should try to
> improve it while we're here.

In my view, the lack of excitement about sanity checks in functions
that deal with node trees in the catalogs is the root of this problem.
I realize that's a deep hole out of which we're unlikely to be able to
climb in the short or even medium term, but we don't have to keep
digging. We either make a rule that pg_get_expr() can apply to
everything stored in the catalogs and produce sensible answers, which
seems to be what you prefer, or we make it return nice errors for the
cases that it can't handle nicely, or some combination of the two. And
whatever we decide, we also document and enforce everywhere.

I don't think it's any more correct for pg_get_expr() to elog(ERROR,
"some internal thing") than it would be for to_timestamp() or
date_bin() or whatever to do something similar. And I think that
careful thinking about supported cases makes life easier for both
users (who know that if they see some junk error report, it's a
mistake rather than intentional) and for developers (who then have a
better chance of knowing what code they need to update to avoid
getting called bozos). Sloppy thinking about which cases are supported
and unsupported leads to bugs, and some of those are likely to be
security bugs.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Tom Lane
Robert Haas  writes:
> Right. I'm not surprised that relpartbound uses those node types. I
> *am* surprised that pg_get_expr() is expected to be able to handle
> them. IOW, they ARE node trees, consonant with the fact that the
> column type is pg_node_tree, but they're NOT expressions.

I'm not sure why you're astonished by that, considering that
psql has applied pg_get_expr to relpartbound since f0e44751d,
which was the same commit that put code into ruleutils.c to
make pg_get_expr work on relpartbounds.

It seems a bit late to change our minds on this; and anyway,
if pg_get_expr didn't handle them, we'd just need to invent
another function that did.

> Alternatively, maybe pg_get_expr() should just fail and tell you that
> this is not an expression, and if you want to see what's in that
> column, you should use the SQL-callable functions specifically
> provided for that purpose (pg_get_partkeydef, I think).

pg_get_partkeydef does something different.

regression=# select pg_get_expr(relpartbound,oid) from pg_class where relname = 
'beta_neg';
   pg_get_expr
--
 FOR VALUES FROM ('-10') TO ('0')
(1 row)

regression=# select pg_get_partkeydef('beta_neg'::regclass);
 pg_get_partkeydef 
---
 RANGE (b)
(1 row)

> I don't know
> why it should be legitimate for pg_get_expr() to just assume that any
> random node tree it gets handed must be an expression without doing
> any sanity checking.

It does fall over if you try to apply it to stored rules:

regression=# select pg_get_expr(ev_action, 0) from pg_rewrite;
ERROR:  unrecognized node type: 232

I'm not terribly excited about that, but maybe we should try to
improve it while we're here.

regards, tom lane




Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-20 Thread Justin Pryzby
This view also doesn't show vacuum progress across a partitioned table.

For comparison:

pg_stat_progress_create_index (added in v12) has:
partitions_total
partitions_done

pg_stat_progress_analyze (added in v13) has:
child_tables_total
child_tables_done

pg_stat_progress_cluster should have something similar.

-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581




Re: Adding CI to our tree

2021-12-20 Thread Andres Freund
Hi,

Attached is v4 of the CI patch.

Changes:

- Move docker image specification out of the patch and generate them together
  with the VM images. The main reason for this is that I got worried about all
  repositories having to recreate the images - they're large.

- Moved the core dump handling for *nix systems into a helper shell script,
  they were a bit long for the .cirrus.yml. And this way the logic can be
  reused for other CI providers

- renamed the task names to include a bit more OS information

- renamed the images to remove -aio- from the name

- deduplicated a few more steps

- Address Thomas' feeback

- Try to address Peter's feedback

Regards,

Andres
>From 264c0797033dd0325032e89fee0095bf75ea7fc1 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 15 Mar 2021 09:25:15 -0700
Subject: [PATCH v4 1/2] ci: Add CI for FreeBSD, Linux, MacOS and Windows,
 utilizing cirrus-ci.

Author: Andres Freund 
Author: Thomas Munro 
Author: Melanie Plageman 
Reviewed-By: Melanie Plageman 
Reviewed-By: Justin Pryzby 
Reviewed-By: Thomas Munro 
Reviewed-By: Peter Eisentraut 
Discussion: https://postgr.es/m/20211001222752.wrz7erzh4cajv...@alap3.anarazel.de
---
 .cirrus.yml | 526 
 src/tools/ci/README |  63 +++
 src/tools/ci/cores_backtrace.sh |  50 +++
 src/tools/ci/gcp_freebsd_repartition.sh |  28 ++
 src/tools/ci/pg_ci_base.conf|  14 +
 src/tools/ci/windows_build_config.pl|  13 +
 6 files changed, 694 insertions(+)
 create mode 100644 .cirrus.yml
 create mode 100644 src/tools/ci/README
 create mode 100755 src/tools/ci/cores_backtrace.sh
 create mode 100755 src/tools/ci/gcp_freebsd_repartition.sh
 create mode 100644 src/tools/ci/pg_ci_base.conf
 create mode 100644 src/tools/ci/windows_build_config.pl

diff --git a/.cirrus.yml b/.cirrus.yml
new file mode 100644
index 000..4116e0155dc
--- /dev/null
+++ b/.cirrus.yml
@@ -0,0 +1,526 @@
+# CI configuration file for CI utilizing cirrus-ci.org
+#
+# See src/tools/ci/README for information.
+
+env:
+  # Source of images / containers
+  GCP_PROJECT: pg-ci-images
+  IMAGE_PROJECT: $GCP_PROJECT
+  CONTAINER_REPO: us-docker.pkg.dev/${GCP_PROJECT}/ci
+
+  # accelerate initial clone, but a bit of depth so that concurrent tasks work
+  CIRRUS_CLONE_DEPTH: 100
+  # Useful to be able to analyse what in a script takes long
+  CIRRUS_LOG_TIMESTAMP: true
+
+  CCACHE_MAXSIZE: "500M"
+
+  # target to test, for all but windows
+  PROVE_FLAGS: --timer
+  CHECK: check-world PROVE_FLAGS=$PROVE_FLAGS
+  CHECKFLAGS: -Otarget
+  PGCTLTIMEOUT: 120
+  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
+  PG_TEST_EXTRA: kerberos ldap ssl
+
+
+# What files to preserve in case tests fail
+on_failure: _failure
+  log_artifacts:
+path: "**/**.log"
+type: text/plain
+  regress_diffs_artifacts:
+path: "**/**.diffs"
+type: text/plain
+  tap_artifacts:
+path: "**/regress_log_*"
+type: text/plain
+
+
+task:
+  name: FreeBSD - 13
+
+  env:
+# FreeBSD on GCP is slow when running with larger number of CPUS /
+# jobs. Using one more job than cpus seems to work best.
+CPUS: 2
+BUILD_JOBS: 3
+TEST_JOBS: 3
+
+CCACHE_DIR: /tmp/ccache_dir
+
+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*freebsd.*'
+
+  compute_engine_instance:
+image_project: $IMAGE_PROJECT
+image: family/pg-ci-freebsd-13-0
+platform: freebsd
+cpu: $CPUS
+memory: 2G
+disk: 50
+
+  sysinfo_script: |
+id
+uname -a
+ulimit -a -H
+ulimit -a -S
+export
+
+  ccache_cache:
+folder: $CCACHE_DIR
+  # Workaround around performance issues due to 32KB block size
+  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
+  create_user_script: |
+pw useradd postgres
+chown -R postgres:postgres .
+mkdir -p ${CCACHE_DIR}
+chown -R postgres:postgres ${CCACHE_DIR}
+  setup_cores_script: |
+mkdir -m 770 /tmp/cores
+chown root:postgres /tmp/cores
+sysctl kern.corefile='/tmp/cores/%N.%P.core'
+
+  # NB: Intentionally building without --with-llvm. The freebsd image size is
+  # already large enough to make task startup slow, and freebsd already takes
+  # longer than other platforms except for windows.
+  configure_script: |
+su postgres <<-EOF
+  ./configure \
+--enable-cassert --enable-debug --enable-tap-tests \
+--enable-nls \
+\
+--with-gssapi \
+--with-icu \
+--with-ldap \
+--with-libxml \
+--with-libxslt \
+--with-lz4 \
+--with-pam \
+--with-perl \
+--with-python \
+--with-ssl=openssl \
+--with-tcl --with-tclconfig=/usr/local/lib/tcl8.6/ \
+--with-uuid=bsd \
+\
+--with-includes=/usr/local/include \
+--with-libs=/usr/local/lib \
+\
+CC="ccache cc" CFLAGS="-O0 -ggdb"
+EOF
+  

Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-20 Thread Peter Geoghegan
On Wed, Dec 1, 2021 at 2:59 PM Imseih (AWS), Sami  wrote:
> The current implementation of pg_stat_progress_vacuum does not provide 
> progress on which index is being vacuumed making it difficult for a user to 
> determine if the "vacuuming indexes" phase is making progress.

I notice that your patch largely assumes that indexes can be treated
like heap relations, in the sense that they're scanned sequentially,
and process each block exactly once (or exactly once per "pass"). But
that isn't quite true. There are a few differences that seem like they
might matter:

* An ambulkdelete() scan of an index cannot take the size of the
relation once, at the start, and ignore any blocks that are added
after the scan begins. And so the code may need to re-establish the
total size of the index multiple times, to make sure no index tuples
are missed -- there may be index tuples that VACUUM needs to process
that appear in later pages due to concurrent page splits. You don't
have the issue with things like IndexBulkDeleteResult.num_pages,
because they report on the index after ambulkdelete/amvacuumcleanup
return (they're not granular progress indicators).

* Some index AMs don't work like nbtree and GiST in that they cannot
do their scan sequentially -- they have to do something like a
logical/keyspace order scan instead, which is *totally* different to
heapam (not just a bit different). There is no telling how many times
each page will be accessed in these other index AMs, and in what
order, even under optimal conditions. We should arguably not even try
to provide any granular progress information here, since it'll
probably be too messy.

I'm not sure what to recommend for your patch, in light of this. Maybe
you should change the names of the new columns to own the squishiness.
For example, instead of using the name index_blks_total, you might
instead use the name index_blks_initial. That might be enough to avoid
user confusion when we scan more blocks than the index initially
contained (within a single ambulkdelete scan).

Note also that we have to do something called backtracking in
btvacuumpage(), which you've ignored -- that's another reasonably
common way that we'll end up scanning a page twice. But that probably
should just be ignored -- it's too narrow a case to be worth caring
about.

-- 
Peter Geoghegan




Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Robert Haas
On Mon, Dec 20, 2021 at 1:13 PM Tom Lane  wrote:
> The reason the regression tests fail if I only patch ruleutils is
> that psql \d on a partitioned table invokes
> ... pg_get_expr(c.relpartbound, c.oid) FROM pg_catalog.pg_class c
> and evidently relpartbound does contain precisely these node types.

Right. I'm not surprised that relpartbound uses those node types. I
*am* surprised that pg_get_expr() is expected to be able to handle
them. IOW, they ARE node trees, consonant with the fact that the
column type is pg_node_tree, but they're NOT expressions.

If we're going to have a policy that all node types stored in the
catalog should be supported by expression_tree_walker even if they're
not actually expressions, we ought to have a rather explicit comment
about that in the comments for expression_tree_walker, because
otherwise somebody might easily make this same mistake again.
Alternatively, maybe pg_get_expr() should just fail and tell you that
this is not an expression, and if you want to see what's in that
column, you should use the SQL-callable functions specifically
provided for that purpose (pg_get_partkeydef, I think). I don't know
why it should be legitimate for pg_get_expr() to just assume that any
random node tree it gets handed must be an expression without doing
any sanity checking.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-20 Thread Peter Geoghegan
On Wed, Dec 15, 2021 at 2:10 PM Bossart, Nathan  wrote:
> nitpick: Shouldn't index_blks_scanned be index_blks_vacuumed?  IMO it
> is more analogous to heap_blks_vacuumed.

+1.

> This will tell us which indexes are currently being vacuumed and the
> current progress of those operations, but it doesn't tell us which
> indexes have already been vacuumed or which ones are pending vacuum.

VACUUM will process a table's indexes in pg_class OID order (outside
of parallel VACUUM, I suppose). See comments about sort order above
RelationGetIndexList().

Anyway, it might be useful to add ordinal numbers to each index, that
line up with this processing/OID order. It would also be reasonable to
display the same number in log_autovacuum* (and VACUUM VERBOSE)
per-index output, to reinforce the idea. Note that we don't
necessarily display a distinct line for each distinct index in this
log output, which is why including the ordinal number there makes
sense.

> I wish option #1 was cleaner, because I think it would be really nice
> to have all this information in a single row.

I do too. I agree with the specific points you raise in your remarks
about what you've called options #2 and #3, but those options still
seem unappealing to me.

-- 
Peter Geoghegan




Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Tom Lane
Robert Haas  writes:
> The commit that added PartitionBoundSpec and PartitionRangeDatum was
> committed by me and authored by Amit Langote. It is the original table
> partitioning commit -- f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63. I'm
> reasonably sure that the reason why those didn't get added to
> expression_tree_walker is that they don't seem like something that can
> ever appear in an expression. I still don't understand why that's not
> true.

The reason the regression tests fail if I only patch ruleutils is
that psql \d on a partitioned table invokes
... pg_get_expr(c.relpartbound, c.oid) FROM pg_catalog.pg_class c
and evidently relpartbound does contain precisely these node types.

regards, tom lane




Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-12-20 Thread Mark Dilger



> On Dec 20, 2021, at 7:37 AM, Pavel Borisov  wrote:
> 
> The patch is pgindented and rebased on the current PG master code.

Thank you, Pavel.


The tests in check_btree.sql no longer create a bttest_unique table, so the 
DROP TABLE is surplusage:

+DROP TABLE bttest_unique;
+ERROR:  table "bttest_unique" does not exist


The changes in pg_amcheck.c to pass the new checkunique parameter will likely 
need to be based on a amcheck version check.  The implementation of 
prepare_btree_command() in pg_amcheck.c should be kept compatible with older 
versions of amcheck, because it connects to remote servers and you can't know 
in advance that the remote servers are as up-to-date as the machine where 
pg_amcheck is installed.  I'm thinking specifically about this change:

@@ -871,7 +877,8 @@ prepare_btree_command(PQExpBuffer sql, RelationInfo *rel, 
PGconn *conn)
if (opts.parent_check)
appendPQExpBuffer(sql,
  "SELECT %s.bt_index_parent_check("
- "index := c.oid, heapallindexed := %s, rootdescend := 
%s)"
+ "index := c.oid, heapallindexed := %s, rootdescend := 
%s, "
+ "checkunique := %s)"
  "\nFROM pg_catalog.pg_class c, pg_catalog.pg_index i "
  "WHERE c.oid = %u "
  "AND c.oid = i.indexrelid "

If the user calls pg_amcheck with --checkunique, and one or more remote servers 
have an amcheck version < 1.4, at a minimum you'll need to avoid calling 
bt_index_parent_check with that parameter, and probably also you'll either need 
to raise a warning or perhaps an error telling the user that such a check 
cannot be performed.


You've forgotten to include contrib/amcheck/amcheck--1.3--1.4.sql in the v5 
patch, resulting in a failed install:

/usr/bin/install -c -m 644 ./amcheck--1.3--1.4.sql ./amcheck--1.2--1.3.sql 
./amcheck--1.1--1.2.sql ./amcheck--1.0--1.1.sql ./amcheck--1.0.sql  
'/Users/mark.dilger/hydra/unique_review.5/tmp_install/Users/mark.dilger/pgtest/test_install/share/postgresql/extension/'
install: ./amcheck--1.3--1.4.sql: No such file or directory
make[2]: *** [install] Error 71
make[1]: *** [checkprep] Error 2

Using the one from the v4 patch fixes the problem.  Please include this file in 
v6, to simplify the review process.


The changes to t/005_opclass_damage.pl look ok.  The creation of a new table 
for the new test seems unnecessary, but only problematic in that it makes the 
test slightly longer to read.  I recommend changing the test to use the same 
table that the prior test uses, but that is just a recommendation, not a 
requirement.

You should add coverage for --checkunique to t/003_check.pl.

You should add coverage for multiple PostgreSQL::Test::Cluster instances 
running differing versions of amcheck, perhaps some on version 1.3 and some on 
version 1.4.  Then test that the --checkunique option works adequately.


I have not reviewed the changes to verify_nbtree.c.  I'll leave that to Peter.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Robert Haas
On Mon, Dec 20, 2021 at 11:25 AM Tom Lane  wrote:
> I figured this would be just a quick hack in ruleutils.c, but was
> dismayed to find the regression tests falling over, because some
> bozo neglected to teach nodeFuncs.c about partition expressions.
> It might be a good idea to back-patch that part, before we find
> some other place that fails.

Calling people bozos isn't very nice. Please don't do that.

The commit that added PartitionBoundSpec and PartitionRangeDatum was
committed by me and authored by Amit Langote. It is the original table
partitioning commit -- f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63. I'm
reasonably sure that the reason why those didn't get added to
expression_tree_walker is that they don't seem like something that can
ever appear in an expression. I still don't understand why that's not
true.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output

2021-12-20 Thread Peter Geoghegan
On Mon, Nov 29, 2021 at 6:51 PM Peter Geoghegan  wrote:
> Attached is a WIP patch doing this.

This has bitrot, so I attach v2, mostly just to keep the CFTester
status green. The only real change is one minor simplification to how
we set everything up, inside heap_vacuum_rel().

-- 
Peter Geoghegan


v2-0001-Unify-VACUUM-VERBOSE-and-log_autovacuum_min_durat.patch
Description: Binary data


Re: Support for NSS as a libpq TLS backend

2021-12-20 Thread Joshua Brindle
On Wed, Dec 15, 2021 at 5:05 PM Daniel Gustafsson  wrote:
>
> > On 25 Nov 2021, at 14:39, Joshua Brindle  
> > wrote:
> > On Wed, Nov 24, 2021 at 8:49 AM Joshua Brindle
> >  wrote:
> >>
> >> On Wed, Nov 24, 2021 at 8:46 AM Joshua Brindle
> >>  wrote:
>
> >> I don't know enough about NSS to know if this is problematic or not
> >> but if I try verify-full without having the root CA in the certificate
> >> store I get:
> >>
> >> $ /usr/pgsql-15/bin/psql "host=localhost sslmode=verify-full user=postgres"
> >> psql: error: SSL error: Issuer certificate is invalid.
> >> unable to shut down NSS context: NSS could not shutdown. Objects are
> >> still in use.
>
> Fixed.
>
> > Something is strange with ssl downgrading and a bad ssldatabase
> > [postgres@11cdfa30f763 ~]$ /usr/pgsql-15/bin/psql "ssldatabase=oops
> > sslcert=client_cert host=localhost"
> > Password for user postgres:
> >
> > 
>
> Also fixed.
>
> > On the server side:
> > 2021-11-25 01:52:01.984 UTC [269] LOG:  unable to handshake:
> > Encountered end of file (PR_END_OF_FILE_ERROR)
>
> This is normal and expected, but to make it easier on users I've changed this
> error message to be aligned with the OpenSSL implementation.
>
> > Other than that and I still haven't tested --with-llvm I've gotten
> > everything working, including with an openssl client. Attached is a
> > dockerfile that gets to the point where a client can connect with
> > clientcert=verify-full. I've removed some of the old cruft and
> > debugging from the previous versions.
>
> Very cool, thanks!  I've been unable to reproduce any issues with llvm but 
> I'll
> keep poking at that.  A new version will be posted shortly with the above and 
> a
> few more fixes.

For v50 this change was required for an llvm build to succeed on my
Fedora system:

diff --git a/configure b/configure
index 25388a75a2..62d554806a 100755
--- a/configure
+++ b/configure
@@ -13276,6 +13276,7 @@ fi

   LDFLAGS="$LDFLAGS $NSS_LIBS $NSPR_LIBS"
   CFLAGS="$CFLAGS $NSS_CFLAGS $NSPR_CFLAGS"
+  CPPFLAGS="$CPPFLAGS $NSS_CFLAGS $NSPR_CFLAGS"


 $as_echo "#define USE_NSS 1" >>confdefs.h

I'm not certain why configure didn't already have that, configure.ac
appears to, but nonetheless it builds, all tests succeed, and a quick
tire kicking looks good.

Thank you.




Re: PublicationActions - use bit flags.

2021-12-20 Thread Alvaro Herrera
On 2021-Dec-20, Peter Eisentraut wrote:

> I don't see why this is better.  It just makes the code longer and adds more
> punctuation and reduces type safety.

Removing one palloc is I think the most important consequence ...
probably not a big deal though.

I think we could change the memcpy calls to struct assignment, as that
would look a bit cleaner, and call it a day.

One thing I would not like would be to change the catalog representation
from bools into an integer.  We do that for pg_trigger.tgflags (IIRC)
and it is horrible.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: PublicationActions - use bit flags.

2021-12-20 Thread Tom Lane
Peter Eisentraut  writes:
> On 20.12.21 01:18, Peter Smith wrote:
>> I felt it is more natural to implement boolean flag combinations using
>> a bitmask instead of a struct of bools. IMO using the bitmask also
>> simplifies assignment and checking of said flags.

> I don't see why this is better.  It just makes the code longer and adds 
> more punctuation and reduces type safety.

It makes the code shorter in places where you need to process all the
flags at once, but I agree it's not really an improvement elsewhere.
Not sure if it's worth changing.

One thing I noted is that the duplicate PublicationActions typedefs
will certainly draw warnings, if not hard errors, from some compilers.
You could get around that by removing the typedefs altogether and just
using "int", which'd be more consistent with our usual practices anyway.
But it does play into Peter's objection about type safety.

regards, tom lane




Re: Getting rid of regression test input/ and output/ files

2021-12-20 Thread Andrew Dunstan


On 12/18/21 18:53, Tom Lane wrote:
>
> 2. Export the values from pg_regress as environment variables,
> and then add a way for the test scripts to read those variables.
> I was a bit surprised to realize that we didn't have any way
> to do that already --- psql has \setenv, so why did we never
> invent \getenv?


I don't recall anyone expressing a need for it at the time we added
\setenv.

+1 for adding it now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: sequences vs. synchronous replication

2021-12-20 Thread Tomas Vondra

On 12/20/21 15:31, Peter Eisentraut wrote:

On 18.12.21 22:48, Tomas Vondra wrote:
What do you mean by "not caching unused sequence numbers"? Reducing 
SEQ_LOG_VALS to 1, i.e. WAL-logging every sequence increment?


That'd work, but I wonder how significant the impact will be. It'd bet 
it hurts the patch adding logical decoding of sequences quite a bit.


It might be worth testing.  This behavior is ancient and has never 
really been scrutinized since it was added.




OK, I'll do some testing to measure the overhead, and I'll see how much 
it affects the sequence decoding patch.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Tom Lane
Robert Haas  writes:
> On Sun, Dec 19, 2021 at 4:17 PM Tom Lane  wrote:
>> I don't see anything particularly surprising here.  pg_get_expr is only
>> able to cope with expression trees over a single relation, but ON UPDATE
>> rules can refer to both OLD and NEW relations.  Maybe we could make the
>> error message more friendly, but there's not much else to be done,
>> I think.

> +1 for making the error message more friendly.

The problem is that the spot where it's thrown doesn't have a lot of
context.  We can fix that by having pg_get_expr itself check for
out-of-spec Vars before starting the recursion, which adds a bit of
overhead but I don't think we're terribly concerned about that.

I figured this would be just a quick hack in ruleutils.c, but was
dismayed to find the regression tests falling over, because some
bozo neglected to teach nodeFuncs.c about partition expressions.
It might be a good idea to back-patch that part, before we find
some other place that fails.

regards, tom lane

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..5d4700430c 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2201,6 +2201,26 @@ expression_tree_walker(Node *node,
 	return true;
 			}
 			break;
+		case T_PartitionBoundSpec:
+			{
+PartitionBoundSpec *pbs = (PartitionBoundSpec *) node;
+
+if (walker(pbs->listdatums, context))
+	return true;
+if (walker(pbs->lowerdatums, context))
+	return true;
+if (walker(pbs->upperdatums, context))
+	return true;
+			}
+			break;
+		case T_PartitionRangeDatum:
+			{
+PartitionRangeDatum *prd = (PartitionRangeDatum *) node;
+
+if (walker(prd->value, context))
+	return true;
+			}
+			break;
 		case T_List:
 			foreach(temp, (List *) node)
 			{
@@ -3092,6 +3112,28 @@ expression_tree_mutator(Node *node,
 return (Node *) newnode;
 			}
 			break;
+		case T_PartitionBoundSpec:
+			{
+PartitionBoundSpec *pbs = (PartitionBoundSpec *) node;
+PartitionBoundSpec *newnode;
+
+FLATCOPY(newnode, pbs, PartitionBoundSpec);
+MUTATE(newnode->listdatums, pbs->listdatums, List *);
+MUTATE(newnode->lowerdatums, pbs->lowerdatums, List *);
+MUTATE(newnode->upperdatums, pbs->upperdatums, List *);
+return (Node *) newnode;
+			}
+			break;
+		case T_PartitionRangeDatum:
+			{
+PartitionRangeDatum *prd = (PartitionRangeDatum *) node;
+PartitionRangeDatum *newnode;
+
+FLATCOPY(newnode, prd, PartitionRangeDatum);
+MUTATE(newnode->value, prd->value, Node *);
+return (Node *) newnode;
+			}
+			break;
 		case T_List:
 			{
 /*
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 8da525c715..5d9596d445 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2622,6 +2625,7 @@ static text *
 pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
 {
 	Node	   *node;
+	Relids		relids;
 	List	   *context;
 	char	   *exprstr;
 	char	   *str;
@@ -2634,6 +2638,27 @@ pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags)
 
 	pfree(exprstr);
 
+	/*
+	 * Throw a user-friendly error if the expression contains Vars we won't be
+	 * able to deparse.  pull_varnos requires a PlannerInfo, but fortunately
+	 * that can be dummy -- it need only contain an empty placeholder_list.
+	 */
+	relids = pull_varnos(makeNode(PlannerInfo), node);
+	if (OidIsValid(relid))
+	{
+		if (!bms_is_subset(relids, bms_make_singleton(1)))
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("expression contains variables of more than one relation")));
+	}
+	else
+	{
+		if (!bms_is_empty(relids))
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("expression contains variables")));
+	}
+
 	/* Prepare deparse context if needed */
 	if (OidIsValid(relid))
 		context = deparse_context_for(relname, relid);


Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2021-12-20 Thread Pavel Borisov
>
> >> I completely agree that checking uniqueness requires looking at the
> heap, but I don't agree that every caller of bt_index_check on an index
> wants that particular check to be performed.  There are multiple ways in
> which an index might be corrupt, and Peter wrote the code to only check
> some of them by default, with options to expand the checks to other
> things.  This is why heapallindexed is optional.  If you don't want to pay
> the price of checking all entries in the heap against the btree, you don't
> have to.
> >>
> >> I've got the idea and revised the patch accordingly. Thanks!
> >> Pfa v4 of a patch. I've added an optional argument to allow uniqueness
> checks for the unique indexes.
> >> Also, I added a test variant to make them work on 32-bit systems.
> Unfortunately, converting the regression test to TAP would be a pain for
> me. Hope it can be used now as a 2-variant regression test for 32 and 64
> bit systems.
> >>
> >> Thank you for your consideration!
> >>
> >> --
> >> Best regards,
> >> Pavel Borisov
> >>
> >> Postgres Professional: http://postgrespro.com
> >> 
> >
> > Looking over v4, here are my review comments...
>

Mark and Peter, big thanks for your ideas!

I had little time to work on this feature until recently, but finally, I've
elaborated v5 patch (PFA)
It contains the following improvements, most of which are based on your
consideration:

- Amcheck tests are reworked into TAP-tests with "break the warranty" by
comparison function changes in the opclass instead of pg_index update.
Mark, again thanks for the sample!
- Added new --checkunique option into pg_amcheck.
- Added documentation and tests for amcheck and pg_amcheck new functions.
- Results are output at ERROR log level like it is done in the other
amcheck tests.
- Rare case of inability to check due to the first entry on a leaf page
being both: (1) equal to the last one on the previous page and (2) deleted
in the heap, is demoted to DEBUG1 log level. In this, I folowed Peter's
consideration that amcheck should do its best to check, but can not always
verify everything. The case is expected to be extremely rare.
- Made pages connectivity based on btpo_next (corrected a bug in the code,
I found meanwhile)
- If snapshot is already taken in heapallindexed case, reuse it for unique
constraint check.

The patch is pgindented and rebased on the current PG master code.
I'd like  to re-attach the patch v5 to the upcoming CF if you don't mind.

I also want to add that some customers face index uniqueness
violations more often than I've expected, and I hope this check could also
help some other PostgreSQL customers.

Your further considerations are welcome as always!
-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


v5-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-un.patch
Description: Binary data


Re: psql format output

2021-12-20 Thread Alvaro Herrera
On 2021-Dec-20, Tom Lane wrote:

> -[ RECORD 1 
> ]---+---
> Schema  | pg_catalog
> Name| pg_copy_logical_replication_slot
> Result data type| record
> Argument data types | src_slot_name name, dst_slot_name name, OUT slot_name 
> name, OUT lsn pg_lsn

> The OP wants to fix that by inserting newlines in the "Argument data
> types" column, which'd help, but it seems to me to be mostly a kluge.
> That's prejudging a lot about how the output will be displayed.
> A more SQL-ish way to do things would be to turn the argument items
> into a set of rows.  I don't quite see how to make that work here,
> but maybe I'm just undercaffeinated as yet.

Maybe one way to improve on this is to have the server inject optional
line break markers (perhaps U+FEFF) that the client chooses whether or
not to convert into a physical line break, based on line length.  So in
\df we would use a query that emits such a marker after every comma and
then psql measures line width and crams as many items in each line as
will fit.  In the above example you would still have a single line for
arguments, because your terminal seems wide enough, but if you use a
smaller terminal then it'd be broken across several.

psql controls what happens because it owns the \df query anyway.  It
could be something simple like

pg_catalog.add_psql_linebreaks(pg_catalog.pg_get_function_arguments(p.oid))

if we don't want to intrude into pg_get_function_arguments itself.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Always assume the user will do much worse than the stupidest thing
you can imagine."(Julien PUYDT)




Re: psql format output

2021-12-20 Thread Tom Lane
Robert Haas  writes:
> It's hard to make any general judgment about how wide people's
> terminal windows are likely to be, but it is my opinion that the root
> of the problem is that \df+ just wants to display a whole lot of stuff
> - and as hackers add more function properties in the future, they're
> likely to get added in here as well. This output format doesn't scale
> nicely for that kind of thing, but it's unclear to me what would be
> any better.

I think the complaint is that even with \x mode, which fixes most
complaints of this sort, the arguments are still too wide:

-[ RECORD 1 
]---+---
Schema  | pg_catalog
Name| pg_copy_logical_replication_slot
Result data type| record
Argument data types | src_slot_name name, dst_slot_name name, OUT slot_name 
name, OUT lsn pg_lsn
Type| func
Volatility  | volatile
Parallel| unsafe
Owner   | postgres
Security| invoker
Access privileges   | 
Language| internal
Source code | pg_copy_logical_replication_slot_c
Description | copy a logical replication slot

The OP wants to fix that by inserting newlines in the "Argument data
types" column, which'd help, but it seems to me to be mostly a kluge.
That's prejudging a lot about how the output will be displayed.
A more SQL-ish way to do things would be to turn the argument items
into a set of rows.  I don't quite see how to make that work here,
but maybe I'm just undercaffeinated as yet.

regards, tom lane




Re: Refactoring of compression options in pg_basebackup

2021-12-20 Thread Robert Haas
On Sat, Dec 18, 2021 at 6:29 AM Michael Paquier  wrote:
> This is one step toward the introduction of LZ4 in pg_basebackup, but
> this refactoring is worth doing on its own, hence a separate thread to
> deal with this problem first.  The options of pg_basebackup are
> reworked to be consistent with pg_receivewal, as follows:
> - --compress ranges now from 1 to 9, instead of 0 to 9.
> - --compression-method={none,gzip} is added, the default is none, same
> as HEAD.
> - --gzip/-z has the same meaning as before, being just a synonym of
> --compression-method=gzip with the default compression level of ZLIB
> assigned if there is no --compress.

One thing we should keep in mind is that I'm also working on adding
server-side compression, initially with gzip, but Jeevan Ladhe has
posted patches to extend that to LZ4. So however we structure the
options they should take that into account. Currently that patch set
adds --server-compression={none,gzip,gzipN} where N from 1 to 9, but
perhaps it should be done differently. Not sure.

> One more thing that I have noticed while hacking this stuff is that we
> have no regression tests for gzip with pg_basebackup, so I have added
> some that are skipped when not compiling the code with ZLIB.

If they don't decompress the backup and run pg_verifybackup on it then
I'm not sure how much they help. Yet, I don't know how to do that
portably.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add id's to various elements in protocol.sgml

2021-12-20 Thread Robert Haas
On Fri, Dec 17, 2021 at 6:54 PM Brar Piening  wrote:
> The purpose is that you can directly link to the id in the public html
> docs which still gets generated (e. g.
> https://www.postgresql.org/docs/14/protocol-replication.html#PROTOCOL-REPLICATION-BASE-BACKUP).
>
> Essentially it gives people discussing the protocol and pointing to a
> certain command or message format the chance to link to the very thing
> they are discussing instead of the top of the lengthy html page.

As a data point, this is something I have also wanted to do, from time
to time. I am generally of the opinion that any place the
documentation has a long list of things, which should add ids, so that
people can link to the particular thing in the list to which they want
to draw someone's attention.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Getting rid of regression test input/ and output/ files

2021-12-20 Thread Tom Lane
Peter Eisentraut  writes:
> On 19.12.21 00:53, Tom Lane wrote:
>> I was a bit surprised to realize that we didn't have any way
>> to do that already --- psql has \setenv, so why did we never
>> invent \getenv?

> You can do
> \set foo `echo $ENVVAR`
> but that's probably not portable enough for your purpose.

I suppose that wouldn't work on Windows, so no.

regards, tom lane




Re: Is my home $HOME or is it getpwent()->pw_dir ?

2021-12-20 Thread Chapman Flack
On 12/20/21 09:15, Peter Eisentraut wrote:
> On 18.12.21 21:57, Chapman Flack wrote:
>> When I start psql, strace shows $HOME being honored when looking
>> for .terminfo and .inputrc, and getpwent()->pw_dir being used
>> to look for .pgpass, .psqlrc, and .psql_history, which of course
>> aren't there.
>>
>> I'm sure the .terminfo and .inputrc lookups are being done by library code.
>> In my experience, it seems traditionally unixy to let $HOME take precedence.
> 
> See this patch: https://commitfest.postgresql.org/36/3362/

Wow, just a couple months ago. Yes, I should have tagged on to that
rather than starting a new thread.

I was proposing an option or variable on the assumption that just changing
the default behavior would be off the table. But I am +1 on just changing
the default behavior, if that's not off the table.

Regards,
-Chap

*seeing that RFC 5322 3.6.4 permits more than one msg-id for in-reply-to,
crosses fingers to see what PGLister will make of it*




Re: Confused comment about drop replica identity index

2021-12-20 Thread Euler Taveira
On Mon, Dec 20, 2021, at 8:11 AM, Michael Paquier wrote:
> On Mon, Dec 20, 2021 at 03:46:13AM +, wangw.f...@fujitsu.com wrote:
> > Here is a patch to correct wrong comment about
> > REPLICA_IDENTITY_INDEX, And improve the pg-doc.
> 
> That's mostly fine.  I have made some adjustments as per the
> attached.
Your patch looks good to me.

> Pondering more about this thread, I don't think we should change the
> existing behavior in the back-branches, but I don't have any arguments
> about doing such changes on HEAD to help the features being worked
> on, either.  So I'd like to apply and back-patch the attached, as a
> first step, to fix the inconsistency.
> 
What do you think about the attached patch? It forbids the DROP INDEX. We might
add a detail message but I didn't in this patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 2ba00335746cf8348e019582a253721047f1a0ac Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 16 Dec 2021 16:54:47 -0300
Subject: [PATCH v1] Disallow dropping an index that is used by replica
 identity

Drop an index that is used by a replica identity can break the
replication for UPDATE and DELETE operations. This may be undesirable
from the usability standpoint. It introduces a behaviour change.
---
 src/backend/commands/tablecmds.c   | 14 +-
 src/test/regress/expected/replica_identity.out |  3 +++
 src/test/regress/sql/replica_identity.sql  |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index bf42587e38..727ad6625b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1530,15 +1530,21 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	   rel->relname);
 
 	/*
+	 * An index used by a replica identity cannot be dropped because some
+	 * operations (UPDATE and DELETE) relies on the index properties
+	 * (uniqueness and NOT NULL) to record old values. These old values are
+	 * essential for identifying the row on the subscriber.
+	 *
 	 * Check the case of a system index that might have been invalidated by a
 	 * failed concurrent process and allow its drop. For the time being, this
 	 * only concerns indexes of toast relations that became invalid during a
 	 * REINDEX CONCURRENTLY process.
 	 */
-	if (IsSystemClass(relOid, classform) && relkind == RELKIND_INDEX)
+	if (relkind == RELKIND_INDEX)
 	{
 		HeapTuple	locTuple;
 		Form_pg_index indexform;
+		bool		indisreplident;
 		bool		indisvalid;
 
 		locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(relOid));
@@ -1550,8 +1556,14 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 
 		indexform = (Form_pg_index) GETSTRUCT(locTuple);
 		indisvalid = indexform->indisvalid;
+		indisreplident = indexform->indisreplident;
 		ReleaseSysCache(locTuple);
 
+		if (indisreplident)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+	 errmsg("could not drop index \"%s\" because it is required by the replica identity", rel->relname)));
+
 		/* Mark object as being an invalid index of system catalogs */
 		if (!indisvalid)
 			invalid_system_index = true;
diff --git a/src/test/regress/expected/replica_identity.out b/src/test/regress/expected/replica_identity.out
index e25ec06a84..56900451f7 100644
--- a/src/test/regress/expected/replica_identity.out
+++ b/src/test/regress/expected/replica_identity.out
@@ -227,6 +227,9 @@ Indexes:
 -- used as replica identity.
 ALTER TABLE test_replica_identity3 ALTER COLUMN id DROP NOT NULL;
 ERROR:  column "id" is in index used as replica identity
+-- DROP INDEX is not allowed if the index is used as REPLICA IDENTITY
+DROP INDEX test_replica_identity3_id_key;
+ERROR:  could not drop index "test_replica_identity3_id_key" because it is required by the replica identity
 DROP TABLE test_replica_identity;
 DROP TABLE test_replica_identity2;
 DROP TABLE test_replica_identity3;
diff --git a/src/test/regress/sql/replica_identity.sql b/src/test/regress/sql/replica_identity.sql
index 33da829713..f46769e965 100644
--- a/src/test/regress/sql/replica_identity.sql
+++ b/src/test/regress/sql/replica_identity.sql
@@ -98,6 +98,9 @@ ALTER TABLE test_replica_identity3 ALTER COLUMN id TYPE bigint;
 -- used as replica identity.
 ALTER TABLE test_replica_identity3 ALTER COLUMN id DROP NOT NULL;
 
+-- DROP INDEX is not allowed if the index is used as REPLICA IDENTITY
+DROP INDEX test_replica_identity3_id_key;
+
 DROP TABLE test_replica_identity;
 DROP TABLE test_replica_identity2;
 DROP TABLE test_replica_identity3;
-- 
2.20.1



Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-20 Thread Robert Haas
On Sun, Dec 19, 2021 at 4:17 PM Tom Lane  wrote:
> Justin Pryzby  writes:
> > I reduced the problematic query to this.
> > SELECT 1 FROM pg_rewrite WHERE
> > pg_get_function_arg_default(ev_class, 1) !~~
> > pg_get_expr(ev_qual, ev_class, false);
>
> Or more simply,
>
> regression=# select pg_get_expr(ev_qual, ev_class, false) from pg_rewrite 
> where rulename = 'pg_settings_u';
> ERROR:  bogus varno: 2
>
> I don't see anything particularly surprising here.  pg_get_expr is only
> able to cope with expression trees over a single relation, but ON UPDATE
> rules can refer to both OLD and NEW relations.  Maybe we could make the
> error message more friendly, but there's not much else to be done,
> I think.

+1 for making the error message more friendly.

(We would certainly have a difficult time making it less friendly.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: psql format output

2021-12-20 Thread Robert Haas
On Fri, Dec 17, 2021 at 5:08 AM Peter Eisentraut
 wrote:
> That's a reasonable idea.  I wonder if it would work in general.  If
> someone had a C function (so no source code) with three arguments, they
> might be annoyed if it now displayed as three lines by default.

The problem I see is that each of those three lines would probably wrap.

For example, consider:

rhaas=# \df+ pg_copy_logical_replication_slot

In an 80-column window, the first non-header line of output looks like this:

 pg_catalog | pg_copy_logical_replication_slot | record   | src_slot_nam

Since we don't even fit the whole parameter name and data type in
there, never mind the rest of the columns, the proposed solution can't
help here. Each of the three output lines are over 300 characters.

When I full-screen my terminal window, it is 254 characters wide. So
if I were working full screen, then this proposal would cause that
output not to wrap when it otherwise would have done so. But if I were
working with a normal size window or even somewhat wider than normal,
it would just give me multiple wrapped lines.

It's hard to make any general judgment about how wide people's
terminal windows are likely to be, but it is my opinion that the root
of the problem is that \df+ just wants to display a whole lot of stuff
- and as hackers add more function properties in the future, they're
likely to get added in here as well. This output format doesn't scale
nicely for that kind of thing, but it's unclear to me what would be
any better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: sequences vs. synchronous replication

2021-12-20 Thread Peter Eisentraut

On 18.12.21 22:48, Tomas Vondra wrote:
What do you mean by "not caching unused sequence numbers"? Reducing 
SEQ_LOG_VALS to 1, i.e. WAL-logging every sequence increment?


That'd work, but I wonder how significant the impact will be. It'd bet 
it hurts the patch adding logical decoding of sequences quite a bit.


It might be worth testing.  This behavior is ancient and has never 
really been scrutinized since it was added.





Re: speed up verifying UTF-8

2021-12-20 Thread John Naylor
On Fri, Dec 17, 2021 at 9:29 AM John Naylor
 wrote:
>
> I plan to push v25 early next week, unless there are further comments.

Pushed, thanks everyone!

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Is my home $HOME or is it getpwent()->pw_dir ?

2021-12-20 Thread Peter Eisentraut

On 18.12.21 21:57, Chapman Flack wrote:

I sometimes do some testing as nobody, on a distro where
getpwent(nobody)->pw_dir is a directory that nobody can't write.
So I end up setting $HOME to a directory that, um, is writable.

When I start psql, strace shows $HOME being honored when looking
for .terminfo and .inputrc, and getpwent()->pw_dir being used
to look for .pgpass, .psqlrc, and .psql_history, which of course
aren't there.

I'm sure the .terminfo and .inputrc lookups are being done by library code.
In my experience, it seems traditionally unixy to let $HOME take precedence.


See this patch: https://commitfest.postgresql.org/36/3362/




Re: Getting rid of regression test input/ and output/ files

2021-12-20 Thread Peter Eisentraut

On 19.12.21 00:53, Tom Lane wrote:

2. Export the values from pg_regress as environment variables,
and then add a way for the test scripts to read those variables.
I was a bit surprised to realize that we didn't have any way
to do that already --- psql has \setenv, so why did we never
invent \getenv?


You can do

\set foo `echo $ENVVAR`

but that's probably not portable enough for your purpose.




Re: PublicationActions - use bit flags.

2021-12-20 Thread Peter Eisentraut

On 20.12.21 01:18, Peter Smith wrote:

For some reason the current HEAD PublicationActions is a struct of
boolean representing combinations of the 4 different "publication
actions".

I felt it is more natural to implement boolean flag combinations using
a bitmask instead of a struct of bools. IMO using the bitmask also
simplifies assignment and checking of said flags.


I don't see why this is better.  It just makes the code longer and adds 
more punctuation and reduces type safety.





Re: Column Filtering in Logical Replication

2021-12-20 Thread Peter Eisentraut

On 17.12.21 22:07, Alvaro Herrera wrote:

So I've been thinking about this as a "security" item (you can see my
comments to that effect sprinkled all over this thread), in the sense
that if a publication "hides" some column, then the replica just won't
get access to it.  But in reality that's mistaken: the filtering that
this patch implements is done based on the queries that *the replica*
executes at its own volition; if the replica decides to ignore the list
of columns, it'll be able to get all columns.  All it takes is an
uncooperative replica in order for the lot of data to be exposed anyway.


During normal replication, the publisher should only send the columns 
that are configured to be part of the publication.  So I don't see a 
problem there.


During the initial table sync, the subscriber indeed can construct any 
COPY command.  We could maybe replace this with a more customized COPY 
command variant, like COPY table OF publication TO STDOUT.


But right now the subscriber is sort of assumed to have access to 
everything on the publisher anyway, so I doubt that this is the only 
problem.  But it's worth considering.





Re: Addition of --no-sync to pg_upgrade for test speedup

2021-12-20 Thread Alvaro Herrera
On 2021-Dec-16, Michael Paquier wrote:

> In pg_upgrade, we let the flush happen with initdb --sync-only, based
> on the binary path of the new cluster, so I think that we are not
> going to miss any test coverage by skipping that.

There was one patch of mine with breakage that only manifested in the
pg_upgrade test *because* of its lack of no-sync.  I'm afraid that this
change would hide certain problems.
https://postgr.es/m/20210130023011.n545o54j65t4k...@alap3.anarazel.de

> Thoughts or opinions?

I'm not 100% comfortable with this.  What can we do to preserve *some*
testing that include syncing?  Maybe some option that a few buildfarm
animals use?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
 Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)




RE: In-placre persistance change of a relation

2021-12-20 Thread Jakub Wartak
Hi Kyotaro,

> At Mon, 20 Dec 2021 17:39:27 +0900 (JST), Kyotaro Horiguchi
>  wrote in
> > At Mon, 20 Dec 2021 07:59:29 +, Jakub Wartak
> >  wrote in
> > > BTW fast feedback regarding that ALTER patch  (there were 4 unlogged
> tables):
> > > #  ALTER TABLE ALL IN TABLESPACE tbs1 set logged;
> > > WARNING:  unrecognized node type: 349
> >
> > lol  I met a server crash. Will fix. Thanks!
> 
> That crash vanished after a recompilation for me and I don't see that error.  
> On
> my dev env node# 349 is T_ALterTableSetLoggedAllStmt, which
> 0002 adds.  So perhaps make clean/make all would fix that.

The fastest I could - I've repeated the whole cycle about that one with fresh 
v9 (make clean, configure, make install, fresh initdb) and I've found two 
problems:

1) check-worlds seems OK but make -C src/test/recovery check shows a couple of 
failing tests here locally and in 
https://cirrus-ci.com/task/4699985735319552?logs=test#L807 :
t/009_twophase.pl  (Wstat: 256 Tests: 24 Failed: 1)
  Failed test:  21
  Non-zero exit status: 1
t/014_unlogged_reinit.pl   (Wstat: 512 Tests: 12 Failed: 2)
  Failed tests:  9-10
  Non-zero exit status: 2
t/018_wal_optimize.pl  (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 38 tests but ran 0.
t/022_crash_temp_files.pl  (Wstat: 7424 Tests: 6 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 9 tests but ran 6.

018 made no sense, I've tried to take a quick look with wal_level=minimal why 
it is failing , it is mystery to me as the sequence seems to be pretty basic 
but the outcome is not:
~> cat repro.sql
create tablespace tbs1 location '/tbs1';
CREATE TABLE moved (id int);
INSERT INTO moved VALUES (1);
BEGIN;
ALTER TABLE moved SET TABLESPACE tbs1;
CREATE TABLE originated (id int);
INSERT INTO originated VALUES (1);
CREATE UNIQUE INDEX ON originated(id) TABLESPACE tbs1;
COMMIT;

~> psql -f repro.sql z3; sleep 1; /usr/pgsql-15/bin/pg_ctl -D 
/var/lib/pgsql/15/data -l logfile -m immediate stop
CREATE TABLESPACE
CREATE TABLE
INSERT 0 1
BEGIN
ALTER TABLE
CREATE TABLE
INSERT 0 1
CREATE INDEX
COMMIT
waiting for server to shut down done
server stopped
~> /usr/pgsql-15/bin/pg_ctl -D /var/lib/pgsql/15/data -l logfile start
waiting for server to start done
server started
z3# select * from moved;
ERROR:  could not open file "pg_tblspc/32834/PG_15_202112131/32833/32838": No 
such file or directory
z3=# select * from originated;
ERROR:  could not open file "base/32833/32839": No such file or directory
z3=# \dt+
  List of relations
 Schema |Name| Type  |  Owner   | Persistence |  Size   | Description
++---+--+-+-+-
 public | moved  | table | postgres | permanent   | 0 bytes |
 public | originated | table | postgres | permanent   | 0 bytes |

This happens even without placing on tablespace at all {for originated table , 
but no for moved on}, some major mishap is there (commit should guarantee 
correctness) or I'm tired and having sloppy fingers.

2) minor one testcase, still something is odd.

drop tablespace tbs1;
create tablespace tbs1 location '/tbs1';
CREATE UNLOGGED TABLE t4 (a int) tablespace tbs1;
CREATE UNLOGGED TABLE t5 (a int) tablespace tbs1;
CREATE UNLOGGED TABLE t6 (a int) tablespace tbs1;
CREATE TABLE t7 (a int) tablespace tbs1;
insert into t7 values (1);
insert into t5 values (1);
insert into t6 values (1);
\dt+
 List of relations
 Schema | Name | Type  |  Owner   | Persistence |Size| Description
+--+---+--+-++-
 public | t4   | table | postgres | unlogged| 0 bytes|
 public | t5   | table | postgres | unlogged| 8192 bytes |
 public | t6   | table | postgres | unlogged| 8192 bytes |
 public | t7   | table | postgres | permanent   | 8192 bytes |
(4 rows)

ALTER TABLE ALL IN TABLESPACE tbs1 set logged; 
==> STILL WARNING:  unrecognized node type: 349
\dt+
 List of relations
 Schema | Name | Type  |  Owner   | Persistence |Size| Description
+--+---+--+-++-
 public | t4   | table | postgres | permanent   | 0 bytes|
 public | t5   | table | postgres | permanent   | 8192 bytes |
 public | t6   | table | postgres | permanent   | 8192 bytes |
 public | t7   | table | postgres | permanent   | 8192 bytes |

So it did rewrite however this warning seems to be unfixed. I've tested on 
e2c52beecdea152ca680a22ef35c6a7da55aa30f.

-J.


Re: Multi-Column List Partitioning

2021-12-20 Thread Amit Langote
Hi,

On Mon, Dec 13, 2021 at 11:37 PM Ashutosh Sharma  wrote:
>
> Hi,
>
> Is this okay?
>
> postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
> CREATE TABLE
>
> postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4, 5, 
> 6));
> CREATE TABLE
>
> postgres=# \d t1
>Partitioned table "public.t1"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  a  | integer |   |  |
>  b  | integer |   |  |
> Partition key: LIST (a, a, a)
> Number of partitions: 1 (Use \d+ to list them.)

I'd say it's not okay for a user to expect this to work sensibly, and
I don't think it would be worthwhile to write code to point that out
to the user if that is what you were implying.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




[PATCH] pg_stat_toast v0.3

2021-12-20 Thread Gunnar "Nick" Bluth

Am 12.12.21 um 17:20 schrieb Gunnar "Nick" Bluth:

Hello -hackers!

Please have a look at the attached patch, which implements some 
statistics for TOAST.


The attached v0.3 includes
* a proper GUC "track_toast" incl. postgresql.conf.sample line
* gathering timing information
* the system view "pg_stat_toast"
 * naming improvements more than welcome!
 * columns "storagemethod" and "compressmethod" added as per Hayato 
Kuroda's suggestion
* documentation (pointing out the potential impacts as per Andres 
Freund's reservations)


Any hints on how to write meaningful tests would be much appreciated!
I.e., will a test

INSERTing some long text to a column raises the view counts and 
"compressedsize" is smaller than "originalsize"


suffice?!?

Cheers & best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - CatoFrom e4f6b8a9b398ff08f76e0c02ca7a1201062beb78 Mon Sep 17 00:00:00 2001
From: "Gunnar \"Nick\" Bluth" 
Date: Mon, 20 Dec 2021 14:09:36 +0100
Subject: [PATCH] pg_stat_toast v0.3


* adds timing information to stats and view
* adds system view
* adds documentation
* adds GUC parameter to postgresql.conf

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index afbb6c35e3..fa40befc16 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7668,6 +7668,32 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_toast (boolean)
+  
+   track_toast configuration parameter
+  
+  
+  
+   
+Enables tracking of TOAST activities.
+Compressions and externalizations are tracked.
+The default is off.
+Only superusers can change this setting.
+   
+
+   
+
+Be aware that this feature, depending on the amount of TOASTable columns in
+your databases, may significantly increase the size of the statistics files
+and the workload of the statistics collector. It is recommended to only
+temporarily activate this to assess the right compression and storage method
+for (a) column(s).
+
+   
+  
+ 
+
  
   stats_temp_directory (string)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a3332b..32d7818096 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -610,6 +610,17 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
+ 
+  pg_stat_toastpg_stat_toast
+  
+   One row for each column that has ever been TOASTed (compressed and/or externalized).
+   Showing the number of externalizations, compression attempts / successes, compressed and
+   uncompressed sizes etc.
+   
+   pg_stat_toast for details.
+  
+ 
+
  
   pg_stat_slrupg_stat_slru
   One row per SLRU, showing statistics of operations. See
@@ -4969,6 +4980,158 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
 
+ 
+  pg_stat_toast
+
+  
+   pg_stat_toast
+  
+
+  
+   The pg_stat_toast view will contain
+   one row for each column of variable size that has been TOASTed since 
+   the last statistics reset. The  parameter
+   controls whether TOAST activities are tracked or not.
+  
+
+  
+   pg_stat_toast View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   schemaname name
+  
+  
+   Name of the schema the relation is in
+  
+ 
+
+ 
+  
+   reloid oid
+  
+  
+   OID of the relation
+  
+ 
+
+ 
+  
+   attnum int
+  
+  
+   Attribute (column) number in the relation
+  
+ 
+
+ 
+  
+   relname name
+  
+  
+   Name of the relation
+  
+ 
+
+ 
+  
+   attname name
+  
+  
+   Name of the attribute (column)
+  
+ 
+
+ 
+  
+   storagemethod char
+  
+  
+   Storage method of the attribute
+  
+ 
+
+ 
+  
+   externalized bigint
+  
+  
+   Number of times this attribute was externalized (pushed to TOAST relation)
+  
+ 
+
+ 
+  
+   compressmethod char
+  
+  
+   Compression method of the attribute (empty means default)
+  
+ 
+
+ 
+  
+   compressattempts bigint
+  
+  
+   Number of times this attribute was compressed
+  
+ 
+
+ 
+  
+   compresssuccesses bigint
+  
+  
+   Number of times the compression was successful (gained a size reduction)
+  
+ 
+
+ 
+  
+   compressedsize bigint
+  
+  
+   

Re: simplifying foreign key/RI checks

2021-12-20 Thread Amit Langote
On Mon, Dec 20, 2021 at 6:19 PM Zhihong Yu  wrote:
> On Sun, Dec 19, 2021 at 10:20 PM Amit Langote  wrote:
>>
>> On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker  
>> wrote:
>> > Sorry for the delay. This patch no longer applies, it has some conflict 
>> > with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a
>>
>> Thanks Corey for the heads up.  Rebased with some cosmetic adjustments.
>>
> Hi,
>
> +   Assert(partidx < 0 || partidx < partdesc->nparts);
> +   partoid = partdesc->oids[partidx];
>
> If partidx < 0, do we still need to fill out partoid and is_leaf ? It seems 
> we can return early based on (should call table_close(rel) first):
>
> +   /* No partition found. */
> +   if (partidx < 0)
> +   return NULL;

Good catch, thanks.  Patch updated.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v12-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patch
Description: Binary data


v12-0002-Avoid-using-SPI-for-some-RI-checks.patch
Description: Binary data


Re: parallel vacuum comments

2021-12-20 Thread Masahiko Sawada
On Mon, Dec 20, 2021 at 1:08 PM Amit Kapila  wrote:
>
> On Mon, Dec 20, 2021 at 8:33 AM Masahiko Sawada  wrote:
> >
> > On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila  wrote:
> > >
> > > Few comments:
> > > =
> > > 1.
> > > + * dead_items stores TIDs whose index tuples are deleted by index 
> > > vacuuming.
> > > + * Each TID points to an LP_DEAD line pointer from a heap page that has 
> > > been
> > > + * processed by lazy_scan_prune.  Also needed by lazy_vacuum_heap_rel, 
> > > which
> > > + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap 
> > > pass.
> > >   */
> > > - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
> > > + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
> > >
> > > Isn't it better to keep these comments atop the structure VacDeadItems
> > > declaration?
> >
> > I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving
> > VacDeadItems to vacuum.c, I thought it's better to keep it as generic
> > TID storage.
> >
>
> Okay, that makes sense.
>
> > >
> > > 2. What is the reason for not moving
> > > lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
> > > can be called from vacuumlazy.c and vacuumparallel.c? Without this
> > > refactoring patch, I think both leader and workers set the same error
> > > context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
> > > vacuuming? Is it because you want a separate context phase for a
> > > parallel vacuum?
> >
> > Since the phases defined as VacErrPhase like
> > VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
> > and error callback function, vacuum_error_callback(), are specific to
> > heap, I thought it'd not be a good idea to move
> > lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
> > vacuumparallel.c can use the phases and error callback function.
> >
>
> How about exposing it via heapam.h? We have already exposed a few
> things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
> proposal, we need to have separate callbacks and phases for index
> vacuuming so that it can be used by both vacuumlazy.c and
> vacuumparallel.c which might not be a good idea.

Yeah, but if we expose VacErrPhase and vacuum_error_callback(), we
need to also expose LVRelState and vacuumparallel.c also uses it,
which seems not a good idea. So we will need to introduce a new struct
dedicated to the error callback function. Is that right?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: a misbehavior of partition row movement (?)

2021-12-20 Thread Amit Langote
On Thu, Oct 14, 2021 at 6:00 PM Amit Langote  wrote:
> On Mon, Sep 20, 2021 at 3:32 PM Amit Langote  wrote:
> > The problem was that the tuplestore
> > (afterTriggers.query_stack[query_level].tuplestore) that I decided to
> > use to store the AFTER trigger tuples of a partitioned table that is
> > the target of an cross-partition update lives only for the duration of
> > a given query.  So that approach wouldn't work if the foreign key
> > pointing into that partitioned table is marked INITIALLY DEFERRED.  To
> > fix, I added a List field to AfterTriggersData that stores the
> > tuplestores to store the tuples of partitioned tables that undergo
> > cross-partition updates in a transaction and are pointed to by
> > INITIALLY DEFERRED foreign key constraints.  I couldn't understand one
> > comment about why using a tuplestore for such cases *might not* work,
> > which as follows:
> >
> >  * Note that we need triggers on foreign tables to be fired in exactly the
> >  * order they were queued, so that the tuples come out of the tuplestore in
> >  * the right order.  To ensure that, we forbid deferrable (constraint)
> >  * triggers on foreign tables.  This also ensures that such triggers do not
> >  * get deferred into outer trigger query levels, meaning that it's okay to
> >  * destroy the tuplestore at the end of the query level.
> >
> > I tried to break the approach using various test cases (some can be
> > seen added by the patch to foreign_key.sql), but couldn't see the
> > issue alluded to in the above comment.  So I've marked the comment
> > with an XXX note as follows:
> >
> > - * Note that we need triggers on foreign tables to be fired in exactly the
> > - * order they were queued, so that the tuples come out of the tuplestore in
> > - * the right order.  To ensure that, we forbid deferrable (constraint)
> > - * triggers on foreign tables.  This also ensures that such triggers do not
> > - * get deferred into outer trigger query levels, meaning that it's okay to
> > - * destroy the tuplestore at the end of the query level.
> > + * Note that we need triggers on foreign and partitioned tables to be 
> > fired in
> > + * exactly the order they were queued, so that the tuples come out of the
> > + * tuplestore in the right order.  To ensure that, we forbid deferrable
> > + * (constraint) triggers on foreign tables.  This also ensures that such
> > + * triggers do not get deferred into outer trigger query levels, meaning 
> > that
> > + * it's okay to destroy the tuplestore at the end of the query level.
> > + * XXX - update this paragraph if the new approach, whereby tuplestores in
> > + * afterTriggers.deferred_tuplestores outlive any given query, can be 
> > proven
> > + * to not really break any assumptions mentioned here.
> >
> > If anyone reading this can think of the issue the original comment
> > seems to be talking about, please let me know.
>
> I brought this up in an off-list discussion with Robert and he asked
> why I hadn't considered not using tuplestores to remember the tuples
> in the first place, specifically pointing out that it may lead to
> unnecessarily consuming a lot of memory when such updates move a bunch
> of rows around.  Like, we could simply remember the tuples to be
> passed to the trigger function using their CTIDs as is done for normal
> (single-heap-relation) updates, though in this case also remember the
> OIDs of the source and the destination partitions of a particular
> cross-partition update.
>
> I had considered that idea before but I think I had overestimated the
> complexity of that approach so didn't go with it.  I tried and the
> resulting patch doesn't look all that complicated, with the added
> bonus that the bulk update case shouldn't consume so much memory with
> this approach like the previous tuplestore version would have.
>
> Updated patches attached.

Patch 0001 conflicted with some pg_dump changes that were recently
committed, so rebased.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v11-0002-Enforce-foreign-key-correctly-during-cross-parti.patch
Description: Binary data


v11-0001-Create-foreign-key-triggers-in-partitioned-table.patch
Description: Binary data


Re: Emit a warning if the extension's GUC is set incorrectly

2021-12-20 Thread Shinya Kato

On 2021-12-17 15:42, Peter Eisentraut wrote:

On 17.12.21 03:25, Shinya Kato wrote:

For now, I'v attached the patch that fixed the compilation error.


I think it would be good if you could split the uncontroversial new
EmitErrorsOnPlaceholders() calls into a separate patch.  And please
add an explanation what exactly the rest of the patch changes.


Thank you for the comment!
I splitted the patch.

- v6-01-Add-EmitWarningsOnPlaceholders.patch
We should use EmitWarningsOnPlaceholders when we use 
DefineCustomXXXVariable.

I don't think there is any room for debate.

- v6-0002-Change-error-level-of-EmitWarningsOnPlaceholders.patch
This is a patch to change the error level of EmitWarningsOnPlaceholders 
from WARNING to ERROR.
I think it's OK to emit ERROR as well as when the wrong GUC is set for 
non-extensions, or since it does not behave abnormally, it can be left 
as WARNING.

Thought?

- v6-0003-Emit-error-when-invalid-extensions-GUCs-are-set.patch
This is a patch to emit error when invalid extension's GUCs are set.
No test changes have been made, so the regression test will fail.
I have created a patch, but I don't think this is necessary because of 
the previous discussion.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/contrib/auth_delay/auth_delay.c b/contrib/auth_delay/auth_delay.c
index 5820ac328d..d11dd1e416 100644
--- a/contrib/auth_delay/auth_delay.c
+++ b/contrib/auth_delay/auth_delay.c
@@ -67,6 +67,9 @@ _PG_init(void)
 			NULL,
 			NULL,
 			NULL);
+
+	EmitWarningsOnPlaceholders("auth_delay");
+
 	/* Install Hooks */
 	original_client_auth_hook = ClientAuthentication_hook;
 	ClientAuthentication_hook = auth_delay_checks;
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index fb38135f7a..0407c7dd64 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -100,6 +100,8 @@ _PG_init(void)
 			 NULL,
 			 NULL,
 			 NULL);
+
+	EmitWarningsOnPlaceholders("pg_trgm");
 }
 
 /*
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 48c7417e6e..36555398ec 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -469,4 +469,6 @@ _PG_init(void)
 			   NULL,
 			   NULL,
 			   NULL);
+
+	EmitWarningsOnPlaceholders("postgres_fdw");
 }
diff --git a/contrib/sepgsql/hooks.c b/contrib/sepgsql/hooks.c
index 19a3ffb7ff..44a09c35a7 100644
--- a/contrib/sepgsql/hooks.c
+++ b/contrib/sepgsql/hooks.c
@@ -455,6 +455,8 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	EmitWarningsOnPlaceholders("sepgsql");
+
 	/* Initialize userspace access vector cache */
 	sepgsql_avc_init();
 
diff --git a/src/pl/tcl/pltcl.c b/src/pl/tcl/pltcl.c
index e11837559d..1389aa0943 100644
--- a/src/pl/tcl/pltcl.c
+++ b/src/pl/tcl/pltcl.c
@@ -474,6 +474,8 @@ _PG_init(void)
 			   PGC_SUSET, 0,
 			   NULL, NULL, NULL);
 
+	EmitWarningsOnPlaceholders("pltclu");
+
 	pltcl_pm_init_done = true;
 }
 
diff --git a/src/test/modules/delay_execution/delay_execution.c b/src/test/modules/delay_execution/delay_execution.c
index b3d0841ba8..8ec623ac52 100644
--- a/src/test/modules/delay_execution/delay_execution.c
+++ b/src/test/modules/delay_execution/delay_execution.c
@@ -91,6 +91,8 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	EmitWarningsOnPlaceholders("delay_execution");
+
 	/* Install our hook */
 	prev_planner_hook = planner_hook;
 	planner_hook = delay_execution_planner;
diff --git a/src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c b/src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c
index 6b0a3db104..3ba33e501c 100644
--- a/src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c
+++ b/src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c
@@ -48,6 +48,9 @@ _PG_init(void)
 			   NULL,
 			   NULL,
 			   NULL);
+
+	EmitWarningsOnPlaceholders("ssl_passphrase");
+
 	if (ssl_passphrase)
 		openssl_tls_init_hook = set_rot13;
 }
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 0b6246676b..adb02d8cb8 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -322,6 +322,8 @@ _PG_init(void)
 			   0,
 			   NULL, NULL, NULL);
 
+	EmitWarningsOnPlaceholders("worker_spi");
+
 	/* set up common data for all our workers */
 	memset(, 0, sizeof(worker));
 	worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7b03046301..5e9407c34a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -9333,6 +9333,13 @@ DefineCustomEnumVariable(const char *name,
 	define_custom_variable(>gen);
 }
 
+/*
+ * For historical context, this function name is EmitWarningsOnPlaceholders,
+ * but it emits an error when an invalid custom GUC is set.
+ *
+ * If we use DefineCustomXXXVariable, it is 

Use JOIN USING aliases in ruleutils.c

2021-12-20 Thread Peter Eisentraut


When reverse-compiling a query, ruleutils.c has some complicated code to 
handle the join output columns of a JOIN USING join.  There used to be 
no way to qualify those columns, and so if there was a naming conflict 
anywhere in the query, those output columns had to be renamed to be 
unique throughout the query.


Since PostgreSQL 14, we have a new feature that allows adding an alias 
to a JOIN USING clause.  This provides a better solution to this 
problem.  This patch changes the logic in ruleutils.c so that if naming 
conflicts with JOIN USING output columns are found in the query, these 
JOIN USING aliases with generated names are attached everywhere and the 
columns are then qualified everywhere.


I made it so that new JOIN USING aliases are only created if needed in 
the query, since we already have the logic of has_dangerous_join_using() 
to compute when that is needed.  We could probably do away with that too 
and always use them, but I think that would be surprising and not what 
people want.


The regression test changes illustrate the effects very well.

This is PoC-level right now.  You will find blatant code duplication in 
set_rtable_names(), and for now I have only commented out some code that 
could be removed, not actually removed it.From da15f7571f43e8eef279c3e66530d67ae9b5ef7f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 20 Dec 2021 11:37:15 +0100
Subject: [PATCH v1] Use JOIN USING aliases in ruleutils.c

When reverse-compiling a query, ruleutils.c has some complicated code
to handle the join output columns of a JOIN USING join.  There used to
be no way to qualify those columns, and so if there was a naming
conflict anywhere in the query, those output columns had to be renamed
to be unique throughout the query.

Since PostgreSQL 14, we have a new feature that allows adding an alias
to a JOIN USING clause.  This provides a better solution to this
problem.  This patch changes the logic in ruleutils.c so that if
naming conflicts with JOIN USING output columns are found in the
query, these JOIN USING aliases with generated names are attached
everywhere and the columns are then qualified everwhere.
---
 src/backend/utils/adt/ruleutils.c |  84 +-
 src/test/regress/expected/create_view.out | 194 +++---
 2 files changed, 178 insertions(+), 100 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 8da525c715..6e8f7efc4e 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -157,13 +157,16 @@ typedef struct
 {
List   *rtable; /* List of RangeTblEntry nodes 
*/
List   *rtable_names;   /* Parallel list of names for RTEs */
+   List   *join_using_aliases; /* Parallel list of join using aliases 
*/
List   *rtable_columns; /* Parallel list of deparse_columns structs 
*/
List   *subplans;   /* List of Plan trees for SubPlans */
List   *ctes;   /* List of CommonTableExpr 
nodes */
AppendRelInfo **appendrels; /* Array of AppendRelInfo nodes, or NULL */
/* Workspace for column alias assignment: */
boolunique_using;   /* Are we making USING names globally 
unique */
+#if 0
List   *using_names;/* List of assigned names for USING 
columns */
+#endif
/* Remaining fields are used only when deparsing a Plan tree: */
Plan   *plan;   /* immediate parent of current 
expression */
List   *ancestors;  /* ancestors of plan */
@@ -3788,6 +3791,7 @@ set_rtable_names(deparse_namespace *dpns, List 
*parent_namespaces,
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
char   *refname;
+   char   *join_using_alias;
 
/* Just in case this takes an unreasonable amount of time ... */
CHECK_FOR_INTERRUPTS();
@@ -3818,6 +3822,16 @@ set_rtable_names(deparse_namespace *dpns, List 
*parent_namespaces,
refname = rte->eref->aliasname;
}
 
+   if (rte->rtekind == RTE_JOIN && rte->joinmergedcols)
+   {
+   if (rte->join_using_alias)
+   join_using_alias = 
rte->join_using_alias->aliasname;
+   else
+   join_using_alias = "ju";
+   }
+   else
+   join_using_alias = NULL;
+
/*
 * If the selected name isn't unique, append digits to make it 
so, and
 * make a new hash entry for it once we've got a unique name.  
For a
@@ -3865,7 +3879,49 @@ set_rtable_names(deparse_namespace *dpns, List 
*parent_namespaces,
}
}
 
+   if (join_using_alias)
+   {
+   

Re: pg_upgrade should truncate/remove its logs before running

2021-12-20 Thread Michael Paquier
On Fri, Dec 17, 2021 at 11:21:13AM -0600, Justin Pryzby wrote:
> I put this together in the simplest way, prefixing all the filenames with the
> configured path..

Well, why not.

> Another options is to chdir() into the given path.  But, pg_upgrade takes (and
> requires) a bunch of other paths, like -d -D -b -B, and those are 
> traditionally
> interpretted relative to CWD.  I could getcwd() and prefix all the -[dDbB] 
> with
> that, but prefixing a handful of binary/data paths is hardly better than
> prefixing a handful of dump/logfile paths.  I suppose that openat() isn't
> portable.  I don't think this it's worth prohibiting relative paths, so I 
> can't
> think of any less-naive way to do this.

If we add a new file, .gitignore would find about it quickly and
inform about a not-so-clean tree.  I would tend to prefer your
approach, here.  Relative paths can be useful.

> I didn't move the delete-old-cluster.sh, since that's intended to stay around
> even after a successful upgrade, as opposed to the other logs, which are
> typically removed at that point.

Makes sense to me.

+   log_opts.basedir = getenv("PG_UPGRADE_LOGDIR");
+   if (log_opts.basedir != NULL)
+   log_opts.basedir = strdup(log_opts.basedir);
+   else
+   log_opts.basedir = "pg_upgrade_log.d";
Why is this controlled with an environment variable?  It seems to me
that an option switch would be much better, no?  While tuning things,
we could choose something simpler for the default, like
"pg_upgrade_log".  I don't have a good history in naming new things,
though :)

.gitignore should be updated, I guess?  Besides, this patch has no
documentation.
--
Michael


signature.asc
Description: PGP signature


Re: Confused comment about drop replica identity index

2021-12-20 Thread Michael Paquier
On Mon, Dec 20, 2021 at 03:46:13AM +, wangw.f...@fujitsu.com wrote:
> Here is a patch to correct wrong comment about
> REPLICA_IDENTITY_INDEX, And improve the pg-doc.

That's mostly fine.  I have made some adjustments as per the
attached.

+ The default for non-system tables. Records the old values of the 
columns
+ of the primary key, if any. The default for non-system tables. 
The same sentence is repeated twice.

+ Records no information about the old row.(This is the
default for system tables.)
For consistency with the rest, this could drop the parenthesis for the
second sentence.

+   USING INDEX index_name
This should use  as markup for index_name.

Pondering more about this thread, I don't think we should change the
existing behavior in the back-branches, but I don't have any arguments
about doing such changes on HEAD to help the features being worked
on, either.  So I'd like to apply and back-patch the attached, as a
first step, to fix the inconsistency.
--
Michael
From 05fe07224a5f31f3d28da088f302772724c00dd4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 20 Dec 2021 20:09:59 +0900
Subject: [PATCH v2] Correct comment and documentation about
 REPLICA_IDENTITY_INDEX

---
 src/include/catalog/pg_class.h|  2 +-
 doc/src/sgml/ref/alter_table.sgml | 51 ++-
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 93338d267c..b46299048e 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -182,7 +182,7 @@ DECLARE_INDEX(pg_class_tblspc_relfilenode_index, 3455, ClassTblspcRelfilenodeInd
 /*
  * an explicitly chosen candidate key's columns are used as replica identity.
  * Note this will still be set if the index has been dropped; in that case it
- * has the same meaning as 'd'.
+ * has the same meaning as 'n'.
  */
 #define		  REPLICA_IDENTITY_INDEX	'i'
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 8f14e4a5c4..a76e2e7322 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -872,16 +872,51 @@ WITH ( MODULUS numeric_literal, REM
  
   This form changes the information which is written to the write-ahead log
   to identify rows which are updated or deleted.  This option has no effect
-  except when logical replication is in use.  DEFAULT
-  (the default for non-system tables) records the
-  old values of the columns of the primary key, if any.  USING INDEX
-  records the old values of the columns covered by the named index, which
-  must be unique, not partial, not deferrable, and include only columns marked
-  NOT NULL.  FULL records the old values of all columns
-  in the row.  NOTHING records no information about the old row.
-  (This is the default for system tables.)
+  except when logical replication is in use.
   In all cases, no old values are logged unless at least one of the columns
   that would be logged differs between the old and new versions of the row.
+ 
+  
+   DEFAULT
+   
+
+ Records the old values of the columns of the primary key, if any.
+ This is the default for non-system tables. 
+
+   
+  
+
+  
+   USING INDEX index_name
+   
+
+ Records the old values of the columns covered by the named index,
+ that must be unique, not partial, not deferrable, and include only
+ columns marked NOT NULL. If this index is
+ dropped, the behavior is the same as NOTHING.
+
+   
+  
+
+  
+   FULL
+   
+
+ Records the old values of all columns in the row.
+
+   
+  
+
+  
+   NOTHING
+   
+
+ Records no information about the old row. This is the default for
+ system tables.
+
+   
+  
+ 
  
 

-- 
2.34.1



signature.asc
Description: PGP signature


Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-20 Thread Yugo NAGATA
Hello Greg,

On Fri, 17 Dec 2021 01:40:45 -0500
Greg Stark  wrote:

> On Thu, 16 Dec 2021 at 22:18, Tom Lane  wrote:
> >
> > * If the sort order is underspecified, or you omit ORDER BY
> > entirely, then it's not clear which rows will be operated on.
> > The LIMIT might stop after just some of the rows in a peer
> > group, and you can't predict which ones.
> 
> Meh, that never seemed very compelling to me. I think that's on the
> user and there are *lots* of cases where the user can easily know
> enough extra context to know that's what she wants. In particular I've
> often wanted to delete one of two identical records and it would have
> been really easy to just do a DELETE LIMIT 1. I know there are ways to
> do it but it's always seemed like unnecessary hassle when there was a
> natural way to express it.

Out of curiosity, could you please tell me the concrete situations
where you wanted to delete one of two identical records?

> > * UPDATE/DELETE necessarily involve the equivalent of SELECT
> > FOR UPDATE, which may cause the rows to be ordered more
> > surprisingly than you expected, ie the sort happens *before*
> > rows are replaced by their latest versions, which might have
> > different sort keys.
> 
> This... is a real issue. Or is it? Just to be clear I think what
> you're referring to is a case like:
> 
> INSERT INTO t values (1),(2)
> 
> In another session: BEGIN; UPDATE t set c=0 where c=2
> 
> DELETE FROM t ORDER BY c ASC LIMIT 1
> 
> 
> In other session: COMMIT
> 
> Which row was deleted? In this case it was the row where c=1. Even
> though the UPDATE reported success (i.e. 1 row updated) so presumably
> it happened "before" the delete. The delete saw the ordering from
> before it was blocked and saw the ordering with c=1, c=2 so apparently
> it happened "before" the update.

When I tried it using my patch, the DELETE deletes the row where c=1
as same as above, but it did not block. Is that the result of an
experiment using my patch or other RDBMS like MySQL?
 
> There are plenty of other ways to see the same surprising
> serialization failure. If instead of a DELETE you did an UPDATE there
> are any number of functions or other features that have been added
> which can expose the order in which the updates happened.
> 
> The way to solve those cases would be to use serializable isolation
> (or even just repeatable read in this case). Wouldn't that also solve
> the DELETE serialization failure too?

Do you mean such serialization failures would be avoided in
SERIALIZABLE or REPATABLE READ by aborting the transaction, such 
serialization failures may be accepted in READ COMMITTED?

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: row filtering for logical replication

2021-12-20 Thread Amit Kapila
On Mon, Dec 20, 2021 at 8:41 AM houzj.f...@fujitsu.com
 wrote:
>
> Thanks for the comments, I agree with all the comments.
> Attach the V49 patch set, which addressed all the above comments on the 0002
> patch.
>

Few comments/suugestions:
==
1.
+ Oid publish_as_relid = InvalidOid;
+
+ /*
+ * For a partition, if pubviaroot is true, check if any of the
+ * ancestors are published. If so, note down the topmost ancestor
+ * that is published via this publication, the row filter
+ * expression on which will be used to filter the partition's
+ * changes. We could have got the topmost ancestor when collecting
+ * the publication oids, but that will make the code more
+ * complicated.
+ */
+ if (pubform->pubviaroot && relation->rd_rel->relispartition)
+ {
+ if (pubform->puballtables)
+ publish_as_relid = llast_oid(ancestors);
+ else
+ publish_as_relid = GetTopMostAncestorInPublication(pubform->oid,
+ancestors);
+ }
+
+ if (publish_as_relid == InvalidOid)
+ publish_as_relid = relid;

I think you can initialize publish_as_relid as relid and then later
override it if required. That will save the additional check of
publish_as_relid.

2. I think your previous version code in GetRelationPublicationActions
was better as now we have to call memcpy at two places.

3.
+
+ if (list_member_oid(GetRelationPublications(ancestor),
+ puboid) ||
+ list_member_oid(GetSchemaPublications(get_rel_namespace(ancestor)),
+ puboid))
+ {
+ topmost_relid = ancestor;
+ }

I think here we don't need to use braces ({}) as there is just a
single statement in the condition.

4.
+#define IDX_PUBACTION_n 3
+ ExprState*exprstate[IDX_PUBACTION_n]; /* ExprState array for row filter.
+One per publication action. */
..
..

I think we can have this define outside the structure. I don't like
this define name, can we name it NUM_ROWFILTER_TYPES or something like
that?

I think we can now merge 0001, 0002, and 0005. We are still evaluating
the performance for 0003, so it is better to keep it separate. We can
take the decision to merge it once we are done with our evaluation.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-12-20 Thread Peter Eisentraut

On 15.12.21 14:15, Gilles Darold wrote:

Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :

On 03.08.21 19:10, Tom Lane wrote:

Gilles Darold  writes:

Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.


Pushed, then.  There's still lots of time to tweak the behavior of 
course.


I have a documentation follow-up to this.  It seems that these new 
functions are almost a de facto standard, whereas the SQL-standard 
functions are not implemented anywhere.  I propose the attached patch 
to update the subsection in the pattern-matching section to give more 
detail on this and suggest equivalent functions among these newly 
added ones.  What do you think?



I'm in favor to apply your changes to documentation. It is a good thing 
to precise the relation between this implementation of the regex_* 
functions and the SQL stardard.


ok, done




RE: Failed transaction statistics to measure the logical replication progress

2021-12-20 Thread osumi.takami...@fujitsu.com
Friday, December 17, 2021 2:03 PM Kyotaro Horiguchi  
wrote:
> It sends stats packets at every commit-like operation on apply workers.  The
> current pgstat is so smart that it refrain from sending stats packets at too 
> high
> frequency.  We already suffer frequent stats packets so apply workers need to
> bahave the same way.
> 
> That is, the new stats numbers are once accumulated locally then the
> accumulated numbers are sent to stats collector by pgstat_report_stat.
Hi, Horiguchi-san.

I felt your point is absolutely right !
Updated the patch to address your concern.

Best Regards,
Takamichi Osumi



v18-0002-Extend-pg_stat_subscription_workers-to-include-g.patch
Description:  v18-0002-Extend-pg_stat_subscription_workers-to-include-g.patch


v18-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch
Description:  v18-0001-Fix-alignments-of-TAP-tests-for-pg_stat_subscrip.patch


Re: simplifying foreign key/RI checks

2021-12-20 Thread Zhihong Yu
On Sun, Dec 19, 2021 at 10:20 PM Amit Langote 
wrote:

> On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker 
> wrote:
> > Sorry for the delay. This patch no longer applies, it has some conflict
> with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a
>
> Thanks Corey for the heads up.  Rebased with some cosmetic adjustments.
>
> Hi,

+   Assert(partidx < 0 || partidx < partdesc->nparts);
+   partoid = partdesc->oids[partidx];

If partidx < 0, do we still need to fill out partoid and is_leaf ? It seems
we can return early based on (should call table_close(rel) first):

+   /* No partition found. */
+   if (partidx < 0)
+   return NULL;

Cheers


Re: In-placre persistance change of a relation

2021-12-20 Thread Kyotaro Horiguchi
At Mon, 20 Dec 2021 17:39:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 20 Dec 2021 07:59:29 +, Jakub Wartak  
> wrote in 
> > BTW fast feedback regarding that ALTER patch  (there were 4 unlogged 
> > tables):
> > #  ALTER TABLE ALL IN TABLESPACE tbs1 set logged;
> > WARNING:  unrecognized node type: 349
> 
> lol  I met a server crash. Will fix. Thanks!

That crash vanished after a recompilation for me and I don't see that
error.  On my dev env node# 349 is T_ALterTableSetLoggedAllStmt, which
0002 adds.  So perhaps make clean/make all would fix that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: row filtering for logical replication

2021-12-20 Thread tanghy.f...@fujitsu.com
On Monday, December 20, 2021 11:24 AM tanghy.f...@fujitsu.com 

> 
> On Wednesday, December 8, 2021 2:29 PM Amit Kapila
>  wrote:
> >
> > On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira  wrote:
> > >
> > > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote:
> > >
> > > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira  wrote:
> > > >
> > > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> > > >
> > > > PS> I will update the commit message in the next version. I barely 
> > > > changed
> the
> > > > documentation to reflect the current behavior. I probably missed some
> > changes
> > > > but I will fix in the next version.
> > > >
> > > > I realized that I forgot to mention a few things about the UPDATE 
> > > > behavior.
> > > > Regardless of 0003, we need to define which tuple will be used to 
> > > > evaluate
> the
> > > > row filter for UPDATEs. We already discussed it circa [1]. This current 
> > > > version
> > > > chooses *new* tuple. Is it the best choice?
> > >
> > > But with 0003, we are using both the tuple for evaluating the row
> > > filter, so instead of fixing 0001, why we don't just merge 0003 with
> > > 0001?  I mean eventually, 0003 is doing what is the agreed behavior,
> > > i.e. if just OLD is matching the filter then convert the UPDATE to
> > > DELETE OTOH if only new is matching the filter then convert the UPDATE
> > > to INSERT.  Do you think that even we merge 0001 and 0003 then also
> > > there is an open issue regarding which row to select for the filter?
> > >
> > > Maybe I was not clear. IIUC we are still discussing 0003 and I would like 
> > > to
> > > propose a different default based on the conclusion I came up. If we 
> > > merged
> > > 0003, that's fine; this change will be useless. If we don't or it is 
> > > optional,
> > > it still has its merit.
> > >
> > > Do we want to pay the overhead to evaluating both tuple for UPDATEs? I'm 
> > > still
> > > processing if it is worth it. If you think that in general the row filter
> > > contains the primary key and it is rare to change it, it will waste cycles
> > > evaluating the same expression twice. It seems this behavior could be
> > > controlled by a parameter.
> > >
> >
> > I think the first thing we should do in this regard is to evaluate the
> > performance for both cases (when we apply a filter to both tuples vs.
> > to one of the tuples). In case the performance difference is
> > unacceptable, I think it would be better to still compare both tuples
> > as default to avoid data inconsistency issues and have an option to
> > allow comparing one of the tuples.
> >
> 
> I did some performance tests to see if 0003 patch has much overhead.
> With which I compared applying first two patches and applying first three 
> patches
> in four cases:
> 1) only old rows match the filter.
> 2) only new rows match the filter.
> 3) both old rows and new rows match the filter.
> 4) neither old rows nor new rows match the filter.
> 
> 0003 patch checks both old rows and new rows, and without 0003 patch, it only
> checks either old or new rows. We want to know whether it would take more time
> if we check the old rows.
> 
> I ran the tests in asynchronous mode and compared the SQL execution time. I 
> also
> tried some complex filters, to see if the difference could be more obvious.
> 
> The result and the script are attached.
> I didn’t see big difference between the result of applying 0003 patch and the
> one not in all cases. So I think 0003 patch doesn’t have much overhead.
> 

In previous test, I ran 3 times and took the average value, which may be 
affected by
performance fluctuations.

So, to make the results more accurate, I tested them more times (10 times) and
took the average value. The result is attached.

In general, I can see the time difference is within 3.5%, which is in an 
reasonable
performance range, I think.

Regards,
Tang


Re: [PATCH] pg_stat_toast

2021-12-20 Thread Gunnar "Nick" Bluth

Am 20.12.2021 um 04:20 schrieb kuroda.hay...@fujitsu.com:

Dear Gunnar,


Hi Kuroda-San!


postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test  SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,10) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]+--
schemaname   | public
reloid   | 16829
attnum   | 2
relname  | test
attname  | lz4
externalizations | 0
compressions | 11
compressionsuccesses | 11
compressionsizesum   | 6299710
originalsizesum  | 320403204
-[ RECORD 2 ]+--
schemaname   | public
reloid   | 16829
attnum   | 3
relname  | test
attname  | std
externalizations | 0
compressions | 11
compressionsuccesses | 11
compressionsizesum   | 8198819
originalsizesum  | 320403204


I'm not sure about TOAST, but currently compressions are configurable:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294

How about adding a new attribute "method" to pg_stat_toast?
ToastAttrInfo *attr->tai_compression represents how compress the data,
so I think it's easy to add.
Or, is it not needed because pg_attr has information?


That information could certainly be included in the view, grabbing the 
information from pg_attribute.attcompression. It probably should!


I guess the next step will be to include that view in the catalog 
anyway, so I'll do that next.


Thx for the feedback!
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Cato




Re: In-placre persistance change of a relation

2021-12-20 Thread Kyotaro Horiguchi
At Mon, 20 Dec 2021 07:59:29 +, Jakub Wartak  
wrote in 
> Hi Kyotaro, I'm glad you are still into this
> 
> > I didn't register for some reasons.
> 
> Right now in v8 there's a typo in ./src/backend/catalog/storage.c :
> 
> storage.c: In function 'RelationDropInitFork':
> storage.c:385:44: error: expected statement before ')' token
> pending->unlink_forknum != INIT_FORKNUM)) <-- here, one ) too much

Yeah, I thought that I had removed it. v9 patch I believe is correct.

> > 1. I'm not sure that we want to have the new mark files.
> 
> I can't help with such design decision, but if there are doubts maybe then 
> add checking return codes around:
> a) pg_fsync() and fsync_parent_path() (??) inside mdcreatemark()
> b) mdunlinkmark() inside mdunlinkmark()
> and PANIC if something goes wrong?

The point is it is worth the complexity it adds. Since the mark file
can resolve another existing (but I don't recall in detail) issue and
this patchset actually fixes it, it can be said to have a certain
extent of persuasiveness. But that doesn't change the fact that it's
additional complexity.

> > 2. Aside of possible bugs, I'm not confident that the crash-safety of
> >  this patch is actually water-tight. At least we need tests for some
> >  failure cases.
> >
> > 3. As mentioned in transam/README, failure in removing smgr mark files
> >leads to immediate shut down.  I'm not sure this behavior is acceptable.
> 
> Doesn't it happen for most of the stuff already? There's even data_sync_retry 
> GUC.

Hmm. Yes, actually it is "as water-tight as possible".  I just want
others' eyes on that perspective.  CF could be the entry point of
others but I'm a bit hesitent to add a new entry..

> > 4. Including the reasons above, this is not fully functionally.
> >For example, if we execute the following commands on primary,
> >replica dones't work correctly. (boom!)
> > 
> >=# CREATE UNLOGGED TABLE t (a int);
> >=# ALTER TABLE t SET LOGGED;
> > 
> > 
> > The following fixes are done in the attched v8.
> > 
> > - Rebased. Referring to Jakub and Justin's work, I replaced direct
> >   access to ->rd_smgr with RelationGetSmgr() and removed calls to
> >   RelationOpenSmgr(). I still separate the "ALTER TABLE ALL IN
> >   TABLESPACE SET LOGGED/UNLOGGED" statement part.
> > 
> > - Fixed RelationCreate/DropInitFork's behavior for non-target
> >   relations. (From Jakub's work).
> > 
> > - Fixed wording of some comments.
> > 
> > - As revisited, I found a bug around recovery. If the logged-ness of a
> >   relation gets flipped repeatedly in a transaction, duplicate
> >   pending-delete entries are accumulated during recovery and work in a
> >   wrong way. sgmr_redo now adds up to one entry for a action.
> > 
> > - The issue 4 above is not fixed (yet).
> 
> Thanks again, If you have any list of crush tests ideas maybe I'll have some 
> minutes 
> to try to figure them out. Is there is any goto list of stuff to be checked 
> to add confidence
> to this patch (as per point #2) ?

Just causing a crash (kill -9) after executing problem-prone command
sequence, then seeing recovery works well would sufficient.

For example:

create unlogged table; begin; insert ..; alter table set logged;
.  Recovery works.

"create logged; begin; {alter unlogged; alter logged;} * 1000; alter
logged; commit/abort" doesn't pollute pgdata.


> BTW fast feedback regarding that ALTER patch  (there were 4 unlogged tables):
> #  ALTER TABLE ALL IN TABLESPACE tbs1 set logged;
> WARNING:  unrecognized node type: 349

lol  I met a server crash. Will fix. Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center