date:20200302

2020-03-02 Thread Shinoda, Noriyoshi (PN Japan A Delivery)





On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A Delivery) wrote:

Hi,

Thank you for developing good features.
The attached patch is a small fix to the committed documentation. This patch 
fixes the description literal for the backup_streamed column.


Thanks for the report and patch! Pushed.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Parallel copy

2020-03-02 Thread vignesh C

On Wed, Feb 26, 2020 at 4:24 PM Amit Kapila  wrote:
>
> On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra
>  wrote:
> >
> > On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote:
> > >Hi,
> > >
> > >> The one piece of information I'm missing here is at least a very
rough
> > >> quantification of the individual steps of CSV processing - for
example
> > >> if parsing takes only 10% of the time, it's pretty pointless to
start by
> > >> parallelising this part and we should focus on the rest. If it's 50%
it
> > >> might be a different story. Has anyone done any measurements?
> > >
> > >Not recently, but I'm pretty sure that I've observed CSV parsing to be
> > >way more than 10%.
> > >
> >
> > Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
> > so I still think we need to do some measurements first.
> >
>
> Agreed.
>
> > I'm willing to
> > do that, but (a) I doubt I'll have time for that until after 2020-03,
> > and (b) it'd be good to agree on some set of typical CSV files.
> >
>
> Right, I don't know what is the best way to define that.  I can think
> of the below tests.
>
> 1. A table with 10 columns (with datatypes as integers, date, text).
> It has one index (unique/primary). Load with 1 million rows (basically
> the data should be probably 5-10 GB).
> 2. A table with 10 columns (with datatypes as integers, date, text).
> It has three indexes, one index can be (unique/primary). Load with 1
> million rows (basically the data should be probably 5-10 GB).
> 3. A table with 10 columns (with datatypes as integers, date, text).
> It has three indexes, one index can be (unique/primary). It has before
> and after trigeers. Load with 1 million rows (basically the data
> should be probably 5-10 GB).
> 4. A table with 10 columns (with datatypes as integers, date, text).
> It has five or six indexes, one index can be (unique/primary). Load
> with 1 million rows (basically the data should be probably 5-10 GB).
>

I have tried to capture the execution time taken for 3 scenarios which I
felt could give a fair idea:
Test1 (Table with 3 indexes and 1 trigger)
Test2 (Table with 2 indexes)
Test3 (Table without indexes/triggers)

I have captured the following details:
File Read time - time taken to read the file from CopyGetData function.
Read line Time -  time taken to read line from NextCopyFrom function(read
time & tokenise time excluded)
Tokenize Time - time taken to tokenize the contents from
NextCopyFromRawFields function.
Data Execution Time - remaining execution time from the total time

The execution breakdown for the tests are  given below:
Test/ Time(In Seconds) Total Time File Read Time Read line /Buffer
Read Time Tokenize
Time Data Execution Time
Test1 1693.369 0.256 34.173 5.578 1653.362
Test2 736.096 0.288 39.762 6.525 689.521
Test3 112.06 0.266 39.189 6.433 66.172
Steps for the scenarios:
Test1(Table with 3 indexes and 1 trigger):
CREATE TABLE census2 (year int,age int,ethnic int,sex int,area text,count
text);
CREATE TABLE census3(year int,age int,ethnic int,sex int,area text,count
text);

CREATE INDEX idx1_census2 on census2(year);
CREATE INDEX idx2_census2 on census2(age);
CREATE INDEX idx2_census2 on census2(ethnic);

CREATE or REPLACE FUNCTION census2_afterinsert()
RETURNS TRIGGER
AS $$
BEGIN
  INSERT INTO census3  SELECT * FROM census2 limit 1;
  RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER census2_trigger AFTER INSERT  ON census2 FOR EACH ROW
EXECUTE PROCEDURE census2_afterinsert();
COPY census2 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);

Test2 (Table with 2 indexes):
CREATE TABLE census1 (year int,age int,ethnic int,sex int,area text,count
text);
CREATE INDEX idx1_census1 on census1(year);
CREATE INDEX idx2_census1 on census1(age);
COPY census1 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);

Test3 (Table without indexes/triggers):
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count
text);
COPY census FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);

Note: The Data8277.csv used was the same data that Ants aasma had used.

>From the above result we could infer that Read line will have to be done
sequentially. Read line time takes about 2.01%, 5.40% and 34.97%of the
total time. I felt we will be able to parallelise the remaining  phases of
the copy. The performance improvement will vary based on the
scenario(indexes/triggers), it will be proportionate to the number of
indexes and triggers. Read line can also be parallelised in txt format(non
csv). I feel parallelising copy could give significant improvement in quite
some scenarios.

Further I'm planning to see how the execution will be for toast table. I'm
also planning to do test on RAM disk where I will configure the data on RAM
disk, so that we can further eliminate the I/O cost.

Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

RE: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

Hi, 

Thank you for developing good features.
The attached patch is a small fix to the committed documentation. This patch 
fixes the description literal for the backup_streamed column.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Fujii Masao [mailto:masao.fu...@oss.nttdata.com] 
Sent: Tuesday, March 3, 2020 12:09 PM
To: Kyotaro Horiguchi 
Cc: amitlangot...@gmail.com; masahiko.saw...@2ndquadrant.com; 
pgsql-hack...@postgresql.org
Subject: Re: pg_stat_progress_basebackup - progress reporting for 
pg_basebackup, in the server side



On 2020/03/03 9:27, Kyotaro Horiguchi wrote:
> At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao 
>  wrote in
>>> Attached is the updated version of the patch.
>>> The previous patch used only pgstat_progress_update_param() even 
>>> when updating multiple values. Since those updates are not atomic, 
>>> this can cause readers of the values to see the intermediate states. 
>>> To avoid this issue, the latest patch uses 
>>> pgstat_progress_update_multi_param(), instead.
>>
>> Attached the updated version of the patch.
>> Barring any objections, I plan to commit this patch.
> 
> It is working as designed and the status names are fine to me.

Thanks for the review! I pushed the patch.

> The last one comment from me.
> The newly defined symbols have inconsistent indents.

Yes, I fixed that.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




monitoring_sgml.diff
Description: monitoring_sgml.diff

Re: color by default

2020-03-02 Thread Michael Paquier

On Mon, Mar 02, 2020 at 01:00:44PM +0100, Juan José Santamaría Flecha wrote:
> - The new entry in the documentation, specially as the PG_COLORS parameter
> seems to be currently undocumented. The programs that can use PG_COLOR
> would benefit from getting a link to it.

The actual problem here is that we don't have an actual centralized
place where we could put that stuff.  And anything able to use this
option is basically anything using src/common/logging.c.

Regarding PG_COLORS, the commit message of cc8d415 mentions it, but we
have no actual example of how to use it, and the original thread has
zero reference to it:
https://www.postgresql.org/message-id/6a609b43-4f57-7348-6480-bd022f924...@2ndquadrant.com

And in fact, it took me a while to figure out that using it is a mix
of three keywords ("error", "warning" or "locus") separated by colons
which need to have an equal sign to the color defined.  Here is for
example how to make the locus show up in yellow with errors in blue:
export PG_COLORS='error=01;34:locus=01;33'

Having to dig into the code to find out that stuff is not a good user
experience.  And I found out about that only because I worked on a
patch touching this area yesterday.
--
Michael

signature.asc
Description: PGP signature

Re: Fastpath while arranging the changes in LSN order in logical decoding

On Wed, Feb 19, 2020 at 6:00 AM David Zhang  wrote:
>
> After manually applied the patch, a diff regenerated is attached.
>
> On 2020-02-18 4:16 p.m., David Zhang wrote:
> > 1. Tried to apply the patch to PG 12.2 commit 
> > 45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), 
> > it doesn't work. Then tried to check the patch, and found the errors 
> > showing below.
> > $ git apply --check 
> > 0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch
> > error: patch failed: contrib/test_decoding/logical.conf:1
> > error: contrib/test_decoding/logical.conf: patch does not apply
> > error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133
> > error: src/backend/replication/logical/reorderbuffer.c: patch does not apply
> >
> > 2. Ran a further check for file "logical.conf", and found there is only one 
> > commit since 2014, which doesn't have the parameter, 
> > "logical_decoding_work_mem = 64kB"
> >
> > 3. Manually apply the patch including 
> > src/backend/replication/logical/reorderbuffer.c, and then ran a simple 
> > logical replication test. A connection issue is found like below,
> > "table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 
> > abalance[integer]:0 filler[character]:' 
> >'
> > pg_recvlogical: error: could not receive data from WAL stream: server 
> > closed the connection unexpectedly
> >   This probably means the server terminated abnormally
> >   before or while processing the request.
> > pg_recvlogical: disconnected; waiting 5 seconds to try again"
> >
> > 4. This connection issue can be reproduced on PG 12.2 commit mentioned 
> > above, the basic steps,
> > 4.1 Change "wal_level = logical" in "postgresql.conf"
> > 4.2 create a logical slot and listen on it,
> > $ pg_recvlogical -d postgres --slot test --create-slot
> > $ pg_recvlogical -d postgres --slot test --start -f -
> >
> > 4.3 from another terminal, run the command below,
> > $ pgbench -i -p 5432 -d postgres
> >
> > Let me know if I did something wrong, and if a new patch is available, I 
> > can re-run the test on the same environment.

Thanks for testing and rebasing.  I think one of the hunks is missing
in your rebased version.  That could be the reason for failure.  Can
you test on my latest version?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Fastpath while arranging the changes in LSN order in logical decoding

On Tue, Mar 3, 2020 at 8:42 AM Dilip Kumar  wrote:
>
> On Mon, Mar 2, 2020 at 7:27 PM David Steele  wrote:
> >
> > Hi Dilip,
> >
> > On 2/18/20 7:30 PM, David Zhang wrote:
> > > After manually applied the patch, a diff regenerated is attached.
> >
> > David's updated patch applies but all logical decoding regression tests
> > are failing on cfbot.
> >
> > Do you know when you will be able to supply an updated patch?
>
> I will try to send in a day or two.

I have rebased the patch.  check-world is passing.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v2-0001-Fastpath-for-sending-changes-to-output-plugin-in-.patch
Description: Binary data

Re: [PATCH] Add schema and table names to partition error

2020-03-02 Thread Chris Bandy


On 3/1/20 10:09 PM, Amit Langote wrote:

Hi Chris,

On Mon, Mar 2, 2020 at 8:51 AM Chris Bandy  wrote:

On 3/1/20 5:14 AM, Amit Kapila wrote:

On Sun, Mar 1, 2020 at 10:10 AM Amit Langote  wrote:


There are couple more instances in src/backend/command/tablecmds.c
where partition constraint is checked:

Maybe, better fix these too for completeness.


Another thing we might need to see is which of these can be
back-patched.  We should also try to write the tests for cases we are
changing even if we don't want to commit those.


I don't have any opinion on back-patching. Existing tests pass. I wasn't
able to find another test that checks the constraint field of errors.
There's a little bit in the tests for psql, but that is about the the
\errverbose functionality rather than specific errors and their fields.


Actually, it's not a bad idea to use \errverbose to test this.



I've added a second patch with tests that cover three of the five errors 
touched by the first patch. Rather than \errverbose, I simply \set 
VERBOSITY verbose. I could not find a way to exclude the location field 
from the output, so those lines will be likely be out of date soon--if 
not already.


I couldn't find a way to exercise the errors in tablecmds.c. Does anyone 
know how to instigate a table rewrite that would violate partition 
constraints? I tried:


ALTER TABLE pterr1 ALTER y TYPE bigint USING (y - 5);
ERROR:  42P16: cannot alter column "y" because it is part of the 
partition key of relation "pterr1"

LOCATION:  ATPrepAlterColumnType, tablecmds.c:10812

Thanks,
Chris
>From c6b39accf9f51f9c08a2fc62e848144776e23ffb Mon Sep 17 00:00:00 2001
From: Chris Bandy 
Date: Sat, 29 Feb 2020 12:47:56 -0600
Subject: [PATCH] Add schema and table names to partition errors

---
 src/backend/commands/tablecmds.c  | 6 --
 src/backend/executor/execMain.c   | 3 ++-
 src/backend/executor/execPartition.c  | 3 ++-
 src/backend/partitioning/partbounds.c | 3 ++-
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git src/backend/commands/tablecmds.c src/backend/commands/tablecmds.c
index 02a7c04fdb..6a32812a1f 100644
--- src/backend/commands/tablecmds.c
+++ src/backend/commands/tablecmds.c
@@ -5342,12 +5342,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	ereport(ERROR,
 			(errcode(ERRCODE_CHECK_VIOLATION),
 			 errmsg("updated partition constraint for default partition \"%s\" would be violated by some row",
-	RelationGetRelationName(oldrel;
+	RelationGetRelationName(oldrel)),
+			 errtable(oldrel)));
 else
 	ereport(ERROR,
 			(errcode(ERRCODE_CHECK_VIOLATION),
 			 errmsg("partition constraint of relation \"%s\" is violated by some row",
-	RelationGetRelationName(oldrel;
+	RelationGetRelationName(oldrel)),
+			 errtable(oldrel)));
 			}
 
 			/* Write the tuple out to the new relation */
diff --git src/backend/executor/execMain.c src/backend/executor/execMain.c
index ee5c3a60ff..7cb486b211 100644
--- src/backend/executor/execMain.c
+++ src/backend/executor/execMain.c
@@ -1878,7 +1878,8 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo,
 			(errcode(ERRCODE_CHECK_VIOLATION),
 			 errmsg("new row for relation \"%s\" violates partition constraint",
 	RelationGetRelationName(resultRelInfo->ri_RelationDesc)),
-			 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0));
+			 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0,
+			 errtable(resultRelInfo->ri_RelationDesc)));
 }
 
 /*
diff --git src/backend/executor/execPartition.c src/backend/executor/execPartition.c
index c13b1d3501..a5542b92c7 100644
--- src/backend/executor/execPartition.c
+++ src/backend/executor/execPartition.c
@@ -345,7 +345,8 @@ ExecFindPartition(ModifyTableState *mtstate,
 			RelationGetRelationName(rel)),
 	 val_desc ?
 	 errdetail("Partition key of the failing row contains %s.",
-			   val_desc) : 0));
+			   val_desc) : 0,
+	 errtable(rel)));
 		}
 
 		if (partdesc->is_leaf[partidx])
diff --git src/backend/partitioning/partbounds.c src/backend/partitioning/partbounds.c
index 35953f23fa..4c47f54a57 100644
--- src/backend/partitioning/partbounds.c
+++ src/backend/partitioning/partbounds.c
@@ -1366,7 +1366,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 ereport(ERROR,
 		(errcode(ERRCODE_CHECK_VIOLATION),
 		 errmsg("updated partition constraint for default partition \"%s\" would be violated by some row",
-RelationGetRelationName(default_rel;
+RelationGetRelationName(default_rel)),
+		 errtable(default_rel)));
 
 			ResetExprContext(econtext);
 			CHECK_FOR_INTERRUPTS();
-- 
2.11.0

>From 2d2ab39bcbc7ea13cd04986b0c8aad6f14449ebd Mon Sep 17 00:00:00 2001
From: Chris Bandy 
Date: Mon, 2 Mar 2020 22:13:28 -0600
Subject: [PATCH] Tests for partition error fields

---
 src/test/regress/expected/partition_errors.out | 37

Re: logical replication empty transactions

On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila  wrote:
>
> On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar  wrote:
> >
> > On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira  wrote:
> > >
> > > Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
> > >  escreveu:
> > > >
> > > > After setting up logical replication of a slowly changing table using 
> > > > the built in pub/sub facility, I noticed way more network traffic than 
> > > > made sense.  Looking into I see that every transaction in that database 
> > > > on the master gets sent to the replica.  99.999+% of them are empty 
> > > > transactions ('B' message and 'C' message with nothing in between) 
> > > > because the transactions don't touch any tables in the publication, 
> > > > only non-replicated tables.  Is doing it this way necessary for some 
> > > > reason?  Couldn't we hold the transmission of 'B' until something else 
> > > > comes along, and then if that next thing is 'C' drop both of them?
> > > >
> > > That is not optimal. Those empty transactions is a waste of bandwidth.
> > > We can suppress them if no changes will be sent. test_decoding
> > > implements "skip empty transaction" as you described above and I did
> > > something similar to it. Patch is attached.
> >
> > I think this significantly reduces the network bandwidth for empty
> > transactions.  I have briefly reviewed the patch and it looks good to
> > me.
> >
>
> One thing that is not clear to me is how will we advance restart_lsn
> if we don't send any empty xact in a system where there are many such
> xacts?  IIRC, the restart_lsn is advanced based on confirmed_flush lsn
> sent by subscriber.  After this change, the subscriber won't be able
> to send the confirmed_flush and for a long time, we won't be able to
> advance restart_lsn.  Is that correct, if so, why do we think that is
> acceptable?  One might argue that restart_lsn will be advanced as soon
> as we send the first non-empty xact, but not sure if that is good
> enough.  What do you think?

It seems like a valid point.  One idea could be that we can track the
last commit LSN which we streamed and if the confirmed flush location
is already greater than that then even if we skip the sending the
commit message we can increase the confirm flush location locally.
Logically, it should not cause any problem because once we have got
the confirmation for whatever we have streamed so far.  So for other
commits(which we are skipping), we can we advance it locally because
we are sure that we don't have any streamed commit which is not yet
confirmed by the subscriber.   This is just my thought, but if we
think from the code and design perspective then it might complicate
the things and sounds hackish.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: First WAL segment file that initdb creates





On 2020/02/19 5:26, David Zhang wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, passed

The issue has been verified using below steps:
1. $ initdb -D /home/test/PG122DATA/data
2. $ ls -l /home/test/PG122DATA/data/pg_wal/
total 16388
-rw--- 1 test test 16777216 Feb 18 12:07 00010001
drwx-- 2 test test 4096 Feb 18 12:07 archive_status

The first WAL segment file created by initdb is "00010001", not 
"0001".


Thanks for the test! I pushed the patch.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: [PATCH] Documentation bug related to client authentication using TLS certificate

2020-03-02 Thread Chris Bandy


Hi, Cary.

On 3/2/20 1:06 PM, Cary Huang wrote:

Hi

I found a document bug about client authentication using TLS 
certificate. When clientcert authentication is enabled in pg_hba.conf, 
libpq does not verify that the *common name*in certificate 
matches*database username*like it is described in the documentation 
before allowing client connection.


Instead, when sslmode is set to “verify-full”, libpq will verify if the 
*server host name*matches the *common name *in client certificate.


This sounds incorrect. My understanding is that the *server* host name 
is always matched with the *server* common name.


 When
sslmode is set to “verify-ca”, libpq will verify that the client is 
trustworthy by checking the certificate trust chain up to the root 
certificate and it does not verify *server hostname*and 
certificate*common name *match in this case.


Similarly, libpq will verify the *server* is trustworthy by checking the 
*server* certificate up to the root. It does not verify that the host 
name matches the common name in the *server* certificate.


In all cases, libpq is responsible for verifying the *server* is who it 
claims to be.


-- Chris

Re: Symbolic names for the values of typalign and typstorage

2020-03-02 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Mar-02, Tom Lane wrote:
>> One thing that I'm not totally happy about, as this stands, is that
>> we have to #include "catalog/pg_type.h" in various places we did
>> not need to before (although only a fraction of the files I touched
>> need that).

> If we think that pg_type.h is the header to handle access to the pg_type
> catalog, then I would think that the function declarations at the bottom
> should be in some "internal" header file; then we can get rid of most
> the #includes in pg_type.h.

Well, aside from indirect inclusions, pg_type.h also brings in a bunch
of type OID macros, which I feel we don't want to broadcast everywhere.

One argument in favor of sticking these new macros somewhere "more
central" is that they apply to both pg_type and pg_attribute (that
is, attalign and attstorage also use them).  That's not a strong
argument, maybe, but it's something.

regards, tom lane

Re: Fastpath while arranging the changes in LSN order in logical decoding

On Mon, Mar 2, 2020 at 7:27 PM David Steele  wrote:
>
> Hi Dilip,
>
> On 2/18/20 7:30 PM, David Zhang wrote:
> > After manually applied the patch, a diff regenerated is attached.
>
> David's updated patch applies but all logical decoding regression tests
> are failing on cfbot.
>
> Do you know when you will be able to supply an updated patch?

I will try to send in a day or two.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side





On 2020/03/03 9:27, Kyotaro Horiguchi wrote:

At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao  
wrote in

Attached is the updated version of the patch.
The previous patch used only pgstat_progress_update_param()
even when updating multiple values. Since those updates are
not atomic, this can cause readers of the values to see
the intermediate states. To avoid this issue, the latest patch
uses pgstat_progress_update_multi_param(), instead.


Attached the updated version of the patch.
Barring any objections, I plan to commit this patch.


It is working as designed and the status names are fine to me.


Thanks for the review! I pushed the patch.


The last one comment from me.
The newly defined symbols have inconsistent indents.


Yes, I fixed that.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side





On 2020/03/02 19:27, Sergei Kornilov wrote:

Hello

I reviewed a recently published patch. Looks good for me.


Thanks for the review! I pushed the patch.


One small note: the values for the new definitions in progress.h seems not to 
be aligned vertically. However, pgindent doesn't objects.


Yes, I fixed that.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Add LogicalTapeSetExtend() to logtape.c

2020-03-02 Thread Adam Lee

On Fri, Feb 28, 2020 at 12:38:55PM -0800, Jeff Davis wrote:
> On Fri, 2020-02-28 at 14:16 +0800, Adam Lee wrote:
> > I noticed another difference, I was using palloc0(), which could be
> > one of the
> > reason, but not sure.
> 
> I changed the palloc0()'s in your code to plain palloc(), and it didn't
> make any perceptible difference. Still slower than the version I posted
> that keeps the flexible array.
> 
> Did you compare all 3? Master, with your patch, and with my patch? I'd
> like to see if you're seeing the same thing that I am.
> 
> > Tested your hashagg-20200226.patch on my laptop(Apple clang version
> > 11.0.0),
> > the average time is 25.9s:
> 
> That sounds high -- my runs are about half that time. Is that with a
> debug build or an optimized one?
> 
> Regards,
>   Jeff Davis

Yes, I was running a debug version. I usually do 'CFLAGS=-O0 -g3'
'--enable-cassert' '--enable-debug'.

Test with a general build:

Master: 12729ms 12970ms 12999ms
With my patch(a pointer): 12965ms 13273ms 13116ms
With your patch(flexible array): 12906ms 12991ms 13043ms

Not obvious I suppose, anyway, your patch looks good to me.

-- 
Adam Lee

Re: SQL/JSON: functions

2020-03-02 Thread Erik Rijkers


On 2020-03-03 00:24, Nikita Glukhov wrote:

On 03.03.2020 2:12, Erik Rijkers wrote:


On 2020-03-02 23:33, Nikita Glukhov wrote:

Attached 42th version of the patches.



20200302/0001-Jsonpath-support-for-json-v42.patch  +
20200302/0002-Add-common-SQL_JSON-clauses-v42.patch+
20200302/0003-Add-invisible-coercion-form-v42.patch+
20200302/0004-Add-function-formats-v42.patch   +
20200302/0005-SQLJSON-constructors-v42.patch   +
20200302/0006-IS-JSON-predicate-v42.patch  +
20200302/0007-SQLJSON-query-functions-v42.patch+

Thanks -- those applied fine.


Compiling, I get this error from the pg_stat_statements module (in 
contrib):


-- [2020.03.03 02:22:20 json_functions/0] make contrib
pg_stat_statements.c: In function ‘JumbleExpr’:
pg_stat_statements.c:2933:29: error: ‘JsonExpr’ {aka ‘struct JsonExpr’} 
has no member named ‘raw_expr’

 2933 | JumbleExpr(jstate, jexpr->raw_expr);
  | ^~
make[1]: *** [pg_stat_statements.o] Error 1
make: *** [all-pg_stat_statements-recurse] Error 2
-- contrib make returned 2 - abort
../../src/Makefile.global:919: recipe for target 'pg_stat_statements.o' 
failed

Makefile:93: recipe for target 'all-pg_stat_statements-recurse' failed
pg_stat_statements.c: In function ‘JumbleExpr’:
pg_stat_statements.c:2933:29: error: ‘JsonExpr’ {aka ‘struct JsonExpr’} 
has no member named ‘raw_expr’

 2933 | JumbleExpr(jstate, jexpr->raw_expr);
  | ^~
make[1]: *** [pg_stat_statements.o] Error 1
make: *** [install-pg_stat_statements-recurse] Error 2
-- contrib make install returned 2 - abort


(so I've edited out pg_stat_statements from the contrib/Makefile and 
compiled a working server for further testing.)



Thanks,

Erik Rijkers

Re: ALTER tbl rewrite loses CLUSTER ON index

2020-03-02 Thread Justin Pryzby

@cfbot: resending with only Amit's 0001, since Michael pushed a variation on
0002.

-- 
Justin
>From 865fc2713ad29d0f8c0f63609a7c15c83cfa5cfe Mon Sep 17 00:00:00 2001
From: Amit Langote 
Date: Thu, 6 Feb 2020 18:14:16 +0900
Subject: [PATCH v5] ALTER tbl rewrite loses CLUSTER ON index

---
 src/backend/commands/tablecmds.c  | 39 +++
 src/test/regress/expected/alter_table.out | 34 
 src/test/regress/sql/alter_table.sql  | 16 +-
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 02a7c04fdb..6b2469bd09 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 static void RebuildConstraintComment(AlteredTableInfo *tab, int pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
+static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
@@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			newcmd->def = (Node *) stmt;
 			tab->subcmds[AT_PASS_OLD_INDEX] =
 lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd);
+
+			/* Preserve index's indisclustered property, if set. */
+			PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId);
 		}
 		else if (IsA(stm, AlterTableStmt))
 		{
@@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			 rel,
 			 NIL,
 			 indstmt->idxname);
+
+	/* Preserve index's indisclustered property, if set. */
+	PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid);
 }
 else if (cmd->subtype == AT_AddConstraint)
 {
@@ -11996,6 +12003,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid,
 	tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd);
 }
 
+/*
+ * For a table's index that is to be recreated due to PostAlterType
+ * processing, preserve its indisclustered property by issuing ALTER TABLE
+ * CLUSTER ON command on the table that will run after the command to recreate
+ * the index.
+ */
+static void
+PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	Assert(OidIsValid(indoid));
+	Assert(pass == AT_PASS_OLD_INDEX);
+
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indoid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	if (indexForm->indisclustered)
+	{
+		AlterTableCmd *newcmd = makeNode(AlterTableCmd);
+
+		newcmd->subtype = AT_ClusterOn;
+		newcmd->name = get_rel_name(indoid);
+		tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd);
+	}
+
+	ReleaseSysCache(indexTuple);
+}
+
 /*
  * Subroutine for ATPostAlterTypeParse().  Calls out to CheckIndexCompatible()
  * for the real analysis, then mutates the IndexStmt based on that verdict.
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index fb6d86a269..a01c6d6ec5 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -4296,3 +4296,37 @@ create trigger xtrig
 update bar1 set a = a + 1;
 INFO:  a=1, b=1
 /* End test case for bug #16242 */
+-- alter type rewrite/rebuild should preserve cluster marking on index
+create table alttype_cluster (a int);
+create index alttype_cluster_a on alttype_cluster (a);
+alter table alttype_cluster cluster on alttype_cluster_a;
+select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass;
+ indisclustered 
+
+ t
+(1 row)
+
+alter table alttype_cluster alter a type bigint;
+select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass;
+ indisclustered 
+
+ t
+(1 row)
+
+drop index alttype_cluster_a;
+alter table alttype_cluster add primary key (a);
+alter table alttype_cluster cluster on alttype_cluster_pkey;
+select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass;
+ indisclustered 
+
+ t
+(1 row)
+
+alter table alttype_cluster alter a type int;
+select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass;
+ indisclustered 
+
+ t
+(1 row)
+
+drop table alttype_cluster;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 3801f19c58..6e9048bbec 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2802,7 +2802,6 @@ drop table at_test_sql_partop;
 drop operator class at_test_sql_partop using btree;
 drop function at_test_sql_partop;

Re: ALTER tbl rewrite loses CLUSTER ON index

2020-03-02 Thread Justin Pryzby

On Mon, Mar 02, 2020 at 12:28:18PM +0900, Michael Paquier wrote:
> > +SELECT indexrelid::regclass FROM pg_index WHERE 
> > indrelid='concur_clustered'::regclass;
> 
> This test should check after indisclustered.  Except that, the patch
> is fine so I'll apply it if there are no objections.

Oops - I realized that, but didn't send a new patch before you noticed - thanks
for handling it.

-- 
Justin

Re: Symbolic names for the values of typalign and typstorage

On 2020-Mar-02, Tom Lane wrote:

> While looking at Tomas' ALTER TYPE patch, I got annoyed by the fact
> that all of the backend writes constants of type alignment and type
> storage values as literal characters, such as 'i' and 'x'.  This is
> not our style for most other "poor man's enum" catalog columns, and
> it makes it really hard to grep for relevant code.  Hence, attached
> is a proposed patch to invent #define names for those values.

Makes sense.

> As is our custom for other similar catalog columns, I only used the
> macros in C code.  There are some references in SQL code too,
> particularly in the regression tests, but the difficulty of replacing
> symbolic references in SQL code seems more than it's worth to fix.

Agreed.

> One thing that I'm not totally happy about, as this stands, is that
> we have to #include "catalog/pg_type.h" in various places we did
> not need to before (although only a fraction of the files I touched
> need that).  Part of the issue is that I used the TYPALIGN_XXX
> macros in tupmacs.h, but did not #include pg_type.h there because
> I was concerned about macro inclusion bloat.  Plausible alternatives
> to the way I did it here include
> 
> * just bite the bullet and #include pg_type.h in tupmacs.h;

I like this one the most -- better than the alternative in the patch --
because it's the most honest IMO, except that there seems to be
altogether too much cruft in pg_type.h that should be elsewhere
(particularly nodes/nodes.h, which includes a large number of other
headers).

If we think that pg_type.h is the header to handle access to the pg_type
catalog, then I would think that the function declarations at the bottom
should be in some "internal" header file; then we can get rid of most
the #includes in pg_type.h.


> Thoughts?  Anybody want to say that this is more code churn
> than it's worth?

It seems worthy cleanup to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-03-02 Thread Peter Geoghegan

On Sun, Mar 1, 2020 at 12:15 PM Floris Van Nee  wrote:
> Minor conflict with that patch indeed. Attached is rebased version. I'm 
> running some tests now - will post the results here when finished.

Thanks.

We're going to have to go back to my original approach to inlining. At
least, it seemed to be necessary to do that to get any benefit from
the patch on my comparatively modest workstation (using a similar
pgbench SELECT benchmark to the one that you ran). Tom also had a
concern about the portability of inlining without using "static
inline" -- that is another reason to avoid the approach to inlining
taken by v3 + v4.

It's possible (though not very likely) that performance has been
impacted by the deduplication patch (commit 0d861bbb), since it
updated the definition of BTreeTupleGetNAtts() itself.

Attached is v5, which inlines in a targeted fashion, pretty much in
the same way as the earliest version. This is the same as v4 in every
other way. Perhaps you can test this.

-- 
Peter Geoghegan

v5-0001-Avoid-pipeline-stall-in-_bt_compare.patch
Description: Binary data

v5-0002-Inline-_bt_compare.patch
Description: Binary data

Re: ALTER tbl rewrite loses CLUSTER ON index

2020-03-02 Thread Michael Paquier

On Mon, Mar 02, 2020 at 12:28:18PM +0900, Michael Paquier wrote:
> This test should check after indisclustered.  Except that, the patch
> is fine so I'll apply it if there are no objections.

I got a second look at this one, and applied it down to 12 after some
small modifications in the new test and in the comments.
--
Michael


signature.asc
Description: PGP signature

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-02 Thread Kyotaro Horiguchi

At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao  
wrote in 
> > Attached is the updated version of the patch.
> > The previous patch used only pgstat_progress_update_param()
> > even when updating multiple values. Since those updates are
> > not atomic, this can cause readers of the values to see
> > the intermediate states. To avoid this issue, the latest patch
> > uses pgstat_progress_update_multi_param(), instead.
> 
> Attached the updated version of the patch.
> Barring any objections, I plan to commit this patch.

It is working as designed and the status names are fine to me.

The last one comment from me.
The newly defined symbols have inconsistent indents.

===
#define PROGRESS_BASEBACKUP_PHASE   0
#define PROGRESS_BASEBACKUP_BACKUP_TOTAL1
#define PROGRESS_BASEBACKUP_BACKUP_STREAMED 2
#define PROGRESS_BASEBACKUP_TBLSPC_TOTAL3
#define PROGRESS_BASEBACKUP_TBLSPC_STREAMED 4

/* Phases of pg_basebackup (as advertised via PROGRESS_BASEBACKUP_PHASE) */
#define PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT   1
#define PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE  2
#define PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP 3
#define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE  4
#define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL  5


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Some improvements to numeric sqrt() and ln()

2020-03-02 Thread Tels


Dear Dean,

On 2020-03-01 20:47, Dean Rasheed wrote:
On Fri, 28 Feb 2020 at 08:15, Dean Rasheed  
wrote:


It's possible that there are further gains to be had in the sqrt()
algorithm on platforms that support 128-bit integers, but I haven't
had a chance to investigate that yet.



Rebased patch attached, now using 128-bit integers for part of
sqrt_var() on platforms that support them. This turned out to be well
worth it (1.5 to 2 times faster than the previous version if the
result has less than 30 or 40 digits).


Thank you for these patches, these sound like really nice improvements.
One thing can to my mind while reading the patch:

+*  If r < 0 Then
+*  Let r = r + 2*s - 1
+*  Let s = s - 1

+   /* s is too large by 1; let r = r + 2*s - 1 and s = s - 
1 */
+   r_int64 += 2 * s_int64 - 1;
+   s_int64--;

This can be reformulated as:

+*  If r < 0 Then
+*  Let r = r + s
+*  Let s = s - 1
+*  Let r = r + s

+   /* s is too large by 1; let r = r + 2*s - 1 and s = s - 
1 */
+   r_int64 += s_int64;
+   s_int64--;
+   r_int64 += s_int64;

which would remove one mul/shift and the temp. variable. Mind you, I 
have
not benchmarked this, so it might make little difference, but maybe it 
is

worth trying it.

Best regards,

Telsdiff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index 10229eb..9e6bb80
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -393,16 +393,6 @@ static const NumericVar const_ten =
 #endif
 
 #if DEC_DIGITS == 4
-static const NumericDigit const_zero_point_five_data[1] = {5000};
-#elif DEC_DIGITS == 2
-static const NumericDigit const_zero_point_five_data[1] = {50};
-#elif DEC_DIGITS == 1
-static const NumericDigit const_zero_point_five_data[1] = {5};
-#endif
-static const NumericVar const_zero_point_five =
-{1, -1, NUMERIC_POS, 1, NULL, (NumericDigit *) const_zero_point_five_data};
-
-#if DEC_DIGITS == 4
 static const NumericDigit const_zero_point_nine_data[1] = {9000};
 #elif DEC_DIGITS == 2
 static const NumericDigit const_zero_point_nine_data[1] = {90};
@@ -518,6 +508,8 @@ static void div_var_fast(const NumericVa
 static int	select_div_scale(const NumericVar *var1, const NumericVar *var2);
 static void mod_var(const NumericVar *var1, const NumericVar *var2,
 	NumericVar *result);
+static void div_mod_var(const NumericVar *var1, const NumericVar *var2,
+		NumericVar *quot, NumericVar *rem);
 static void ceil_var(const NumericVar *var, NumericVar *result);
 static void floor_var(const NumericVar *var, NumericVar *result);
 
@@ -7712,6 +7704,7 @@ div_var_fast(const NumericVar *var1, con
 			 NumericVar *result, int rscale, bool round)
 {
 	int			div_ndigits;
+	int			load_ndigits;
 	int			res_sign;
 	int			res_weight;
 	int		   *div;
@@ -7766,9 +7759,6 @@ div_var_fast(const NumericVar *var1, con
 	div_ndigits += DIV_GUARD_DIGITS;
 	if (div_ndigits < DIV_GUARD_DIGITS)
 		div_ndigits = DIV_GUARD_DIGITS;
-	/* Must be at least var1ndigits, too, to simplify data-loading loop */
-	if (div_ndigits < var1ndigits)
-		div_ndigits = var1ndigits;
 
 	/*
 	 * We do the arithmetic in an array "div[]" of signed int's.  Since
@@ -7781,9 +7771,16 @@ div_var_fast(const NumericVar *var1, con
 	 * (approximate) quotient digit and stores it into div[], removing one
 	 * position of dividend space.  A final pass of carry propagation takes
 	 * care of any mistaken quotient digits.
+	 *
+	 * Note that div[] doesn't necessarily contain all of the digits from the
+	 * dividend --- the desired precision plus guard digits might be less than
+	 * the dividend's precision.  This happens, for example, in the square
+	 * root algorithm, where we typically divide a 2N-digit number by an
+	 * N-digit number, and only require a result with N digits of precision.
 	 */
 	div = (int *) palloc0((div_ndigits + 1) * sizeof(int));
-	for (i = 0; i < var1ndigits; i++)
+	load_ndigits = Min(div_ndigits, var1ndigits);
+	for (i = 0; i < load_ndigits; i++)
 		div[i + 1] = var1digits[i];
 
 	/*
@@ -7844,9 +7841,15 @@ div_var_fast(const NumericVar *var1, con
 			maxdiv += Abs(qdigit);
 			if (maxdiv > (INT_MAX - INT_MAX / NBASE - 1) / (NBASE - 1))
 			{
-/* Yes, do it */
+/*
+ * Yes, do it.  Note that if var2ndigits is much smaller than
+ * div_ndigits, we can save a significant amount of effort
+ * here by noting that we only need to normalise those div[]
+ * entries touched where prior iterations subtracted multiples
+ * of the divisor.
+ */
 carry = 0;
-for (i = div_ndigits; i > qi; i--)
+for (i = Min(qi + var2ndigits - 2, div_ndigits); i > qi; i--)
 {
 	newdig = div[i] + carry;
 	if

Re: Improve search for missing parent downlinks in amcheck

2020-03-02 Thread Peter Geoghegan

Hi Alexander,

Apologies for the delayed response. I was a little tired from the
deduplication project.

On Mon, Feb 24, 2020 at 2:54 PM Alexander Korotkov
 wrote:
> Thank you for your review.  The revised version is attached.

This has bitrot, because of the deduplication patch. Shouldn't be hard
to rebase, though.

> > 'foo, -inf'
> > 'foo, (1,24)'
> > 'food, -inf'. <-- This pivot tuple's downlink points to the final leaf
> > page that's filled with duplicates of the value 'foo'
> > 'food, (3,19)' <-- This pivot tuple's downlink points to the *first*
> > leaf page that's filled with duplicates of the value 'food'
> > ...

> Thank you for the explanation!

I taught pageinspect to display a "htid" field for pivot tuples
recently, making it easier to visualize this example.

I think that you should say more about how "lowkey" is used here:

> /*
> -* Record if page that is about to become target is the right half of
> -* an incomplete page split.  This can go stale immediately in
> -* !readonly case.
> +* Copy current target low key as the high key of right sibling.
> +* Allocate memory in upper level context, so it would be cleared
> +* after reset of target context.
> +*
> +* We only need low key for parent check.
>  */
> -   state->rightsplit = P_INCOMPLETE_SPLIT(opaque);
> +   if (state->readonly && !P_RIGHTMOST(opaque))
> +   {

Say something about concurrent page splits, since they're the only
case where we actually use lowkey. Maybe say something like: "We
probably won't end up doing anything with lowkey, but it's simpler for
readonly verification to always have it available".

> Actually, lowkey is used for removing "cousin page verification blind
> spot" when there are incomplete splits.  It might happen that we read
> child with hikey matching its parent high key only when
> bt_child_highkey_check() is called for "minus infinity" key of parent
> right sibling.  Saving low key helps in this case.

That makes sense to me.

> It appears that these false positives were cause by very basic error made 
> here:
>
> if (!first && !P_IGNORE(opaque))
> {
> /* blkno probably has missing parent downlink */
> bt_downlink_missing_check(state, rightsplit, blkno, page);
> }
>
> actually it should be
>
> if (blkno != downlink && !P_IGNORE(opaque))
> {
> /* blkno probably has missing parent downlink */
> bt_downlink_missing_check(state, rightsplit, blkno, page);
> }
>
> So "blkno == downlink" means blkno has downlink, not being checked
> first in the loop.  This is remains of old version of patch which I
> forget to clean.  Now the check you've described works for me.
>
> If you still think lowkey check is a problem, please help me figure it out.

* I think that these comments could still be clearer:

> +   /*
> +* We're going to try match child high key to "negative
> +* infinity key".  This normally happens when the last child
> +* we visited for target's left sibling was an incomplete
> +* split. So, we must be still on the child of target's left
> +* sibling. Thus, we should match to target's left sibling
> +* high key. Thankfully we saved it, it's called a "low key".
> +*/

Maybe start with "We cannot try to match child's high key to a
negative infinity key in target, since there is nothing to compare.
However...". Perhaps use terms like "cousin page" and "subtree", which
can be useful. Alternatively, mention this case in the diagram example
at the top of bt_child_highkey_check(). It's tough to write comments
like this, but I think it's worth it.

Note that a high key is also a pivot tuple, so I wouldn't mention high
keys here:

> +/*
> + * Check if two tuples are binary identical except the block number.  So,
> + * this function is capable to compare high keys with pivot keys.
> + */
> +static bool
> +bt_pivot_tuple_identical(IndexTuple itup1, IndexTuple itup2)
> +{

v7 looks pretty close to being commitable, though I'll probably want
to update some comments that you haven't touched when you commit this.
I should probably wait until you've committed the patch to go do that.
I'm thinking of things like old comments in bt_downlink_check().

I will test the patch properly one more time when you produce a new
revision. I haven't really tested it since the last time.

-- 
Peter Geoghegan

Re: Internal key management system

2020-03-02 Thread Cary Huang

Hi Masahiko

Please see below my comments regarding kms_v4.patch that you have shared 
earlier.

(1)
There is a discrepancy between the documentation and the actual function 
definition. For example in func.sgml, it states pg_wrap_key takes argument in 
text data type but in pg_proc.dat and kmgr.c, the function is defined to take 
argument in bytea data type.



===> doc/src/sgml/func.sgml

+ 

+  

+   pg_wrap_key

+  

+  pg_wrap_key(data 
text)

+ 

+ 

+  bytea

+ 



===> src/include/catalog/pg_proc.dat

+{ oid => '8201', descr => 'wrap the given secret',

+  proname => 'pg_wrap_key',

+  provolatile => 'v', prorettype => 'bytea',

+  proargtypes => 'bytea', prosrc => 'pg_wrap_key' },



===> src/backend/crypto/kmgr.c

+Datum

+pg_wrap_key(PG_FUNCTION_ARGS)

+{

+   bytea  *data = PG_GETARG_BYTEA_PP(0);

+   bytea  *res;

+   int datalen;

+   int reslen;

+   int len;

+



(2)

I think the documentation needs to make clear the difference between a key and 
a user secret. Some parts of it are trying to mix the 2 terms together when 
they shouldn't. To my understanding, a key is expressed as binary data that is 
actually used in the encryption and decryption operations. A user secret, on 
the other hand, is more like a passphrase, expressed as string, that is used to 
derive a encryption key for subsequent encryption operations.



If I just look at the function names "pg_wrap_key" and "pg_unwrap_key", I 
immediately feel that these functions are used to encapsulate and uncover 
cryptographic key materials. The input and output to these 2 functions should 
all be key materials expressed in bytea data type. In previous email 
discussion, there was only one key material in discussion, called the master 
key (generated during initdb and stored in cluster), and this somehow 
automatically makes people (including myself) associate pg_wrap_key and 
pg_unwrap_key to be used on this master key and raise a bunch of security 
concerns around it.



Having read the documentation provided by the patch describing pg_wrap_key and 
pg_unwrap_key, they seem to serve another purpose. It states that pg_wrap_key 
is used to encrypt a user-supplied secret (text) with the master key and 
produce a wrapped secret while pg_unwrap_key does the opposite, so we can 
prevent user from having to enter the secret in plaintext when using pgcrypto 
functions. 



This use case is ok but user secret is not really a cryptographic key material 
is it? It is more similar to a secret passphrase expressed in text and 
pg_wrap_key is merely used to turn the passphrase into a wrapped passphrase 
expressed in bytea.



If I give pg_wrap_key with a real key material expressed in bytea, I will not 
be able to unwrap it properly:



Select pg_unwrap_key 
(pg_wrap_key(E'\\x2b073713476f9f0e761e45c64be8175424d2742e7d53df90b6416f1d84168e8a')
 );



    pg_unwrap_key

--

+\x077\x13Go\x0Ev\x1EEK\x17T$t.}SߐAo\x1D\x16

(1 row)



Maybe we should rename these SQL functions like this to prevent confusion.

=> pg_wrap_secret (takes a text, returns a bytea)

=> pg_unwrap_secret(takes a bytea, returns a text)



if there is a use case for users to encapsulate key materials then we can 
define 2 more wrap functions for these, if there is no use case, don't bother:

=> pg_wrap_key (takes a bytea, returns a bytea)

=> pg_unwrap_key (takes a bytea, returns a bytea)



(3)

I would rephrase "chapter 32: Encryption Key Management Part III. Server 
Administration" documentation like this:



=

PostgreSQL supports Encryption Key Management System, which is enabled when 
PostgreSQL is built with --with-openssl option and cluster_passphrase_command 
is specified during initdb process. The user-provided 
cluster_passphrase_command in postgresql.conf and the 
cluster_passphrase_command specified during initdb process must match, 
otherwise, the database cluster will not start up.



The user-provided cluster passphrase is derived into a Key Encryption Key 
(KEK), which is used to encapsulate the Master Encryption Key generated during 
the initdb process. The encapsulated Master Encryption Key is stored inside the 
database cluster.



Encryption Key Management System provides several functions to allow users to 
use the master encryption key to wrap and unwrap their own encryption secrets 
during encryption and decryption operations. This feature allows users to 
encrypt and decrypt data without knowing the actual key.

=



(4)

I would rephrase "chapter 32.2: Wrap and Unwrap user secret" documentation like 
this: Note that I rephrase based on (2) and uses pg_(un)wrap_secret instead.



=
Encryption key management System provides several functions described in Table 
9.97, to

Re: Improving connection scalability: GetSnapshotData()

2020-03-02 Thread Andres Freund

Hi,

On 2020-03-01 00:36:01 -0800, Andres Freund wrote:
> Here are some numbers for the submitted patch series. I'd to cull some
> further improvements to make it more manageable, but I think the numbers
> still are quite convincing.
> 
> The workload is a pgbench readonly, with pgbench -M prepared -c $conns
> -j $conns -S -n for each client count.  This is on a machine with 2
> Intel(R) Xeon(R) Platinum 8168, but virtualized.
> 
> conns   tps mastertps pgxact-split
> 
> 1   26842.49284526524.194821
> 10  246923.158682   249224.782661
> 50  695956.539704   709833.746374
> 100 1054727.043139  1903616.306028
> 200 964795.282957   1949200.338012
> 300 906029.377539   1927881.231478
> 400 845696.690912   1911065.369776
> 500 812295.222497   1926237.255856
> 600 888030.104213   1903047.236273
> 700 866896.532490   1886537.202142
> 800 863407.341506   1883768.592610
> 900 871386.608563   1874638.012128
> 1000887668.277133   1876402.391502
> 1500860051.361395   1815103.564241
> 2000890900.098657   1775435.271018
> 3000874184.980039   1653953.817997
> 4000845023.080703   1582582.316043
> 5000817100.195728   1512260.802371
> 
> I think these are pretty nice results.

> One further cool recognition of the fact that GetSnapshotData()'s
> results can be made to only depend on the set of xids in progress, is
> that caching the results of GetSnapshotData() is almost trivial at that
> point: We only need to recompute snapshots when a toplevel transaction
> commits/aborts.
> 
> So we can avoid rebuilding snapshots when no commt has happened since it
> was last built. Which amounts to assigning a current 'commit sequence
> number' to the snapshot, and checking that against the current number
> at the time of the next GetSnapshotData() call. Well, turns out there's
> this "LSN" thing we assign to commits (there are some small issues with
> that though).  I've experimented with that, and it considerably further
> improves the numbers above. Both with a higher peak throughput, but more
> importantly it almost entirely removes the throughput regression from
> 2000 connections onwards.
> 
> I'm still working on cleaning that part of the patch up, I'll post it in
> a bit.

I triggered a longer run on the same hardware, that also includes
numbers for the caching patch.

nclientsmaster  pgxact-splitpgxact-split-cache
1   29742.80507429086.87440428120.709885
2   58653.00592156610.43291957343.937924
3   116580.383993   115102.94057117512.656103
4   150821.023662   154130.354635   152053.714824
5   186679.754357   189585.156519   191095.841847
6   219013.756252   223053.409306   224480.026711
7   256861.673892   256709.57311262427.179555
8   291495.547691   294311.524297   296245.219028
9   332835.641015   333223.666809   335460.280487
10  367883.74842373562.206447   375682.894433
15  561008.204553   578601.577916   587542.061911
20  748000.911053   794048.140682   810964.700467
25  904581.660543   1037279.089703  1043615.577083
30  999231.007768   1251113.123461  1288276.726489
35  1001274.289847  1438640.653822  1438508.432425
40  991672.445199   1518100.079695  1573310.171868
45  994427.395069   1575758.31948   1649264.339117
50  1017561.371878  1654776.716703  1715762.303282
60  993943.210188   1720318.989894  1789698.632656
70  971379.995255   1729836.303817  1819477.25356
80  966276.137538   1744019.347399  1842248.57152
90  901175.211649   1768907.069263  1847823.970726
100 803175.743261784636.397822  1865795.782943
125 664438.039582   1806275.514545  1870983.64688
150 623562.201749   1796229.009658  1876529.428419
175 680683.150597   1809321.487338  1910694.40987
200 668413.988251   1833457.942035  1878391.674828
225 682786.299485   1816577.462613  1884587.77743
250 727308.562076   1825796.324814  1864692.025853
275 676295.999761   1843098.107926  1908698.584573
300 698831.398432   1832068.168744  1892735.290045
400 661534.639489   1859641.983234  1898606.247281
500 645149.788352   1851124.475202  1888589.134422
600 740636.323211   1875152.669115  1880653.747185
700 858645.363292   1833527.505826  1874627.969414
800 858287.957814   1841914.668668  1892106.319085
900 882204.933544   1850998.221969  1868260.041595
1000910988.551206   1836336.091652  1862945.18557
1500917727.928271808822.338465  1864150.00307
2000982137.053108   1813070.209217  1877104.342864
30001013514.639108  1753026.733843  1870416.924248
40001025476.80688   1600598.543635  1859908.314496
50001019889.160511  1534501.389169  1870132.571895
7500968558.864242   1352137.828569  1853825.376742
1   887558.112017

Re: SQL/JSON: functions

2020-03-02 Thread Erik Rijkers


On 2020-03-02 23:33, Nikita Glukhov wrote:

Attached 42th version of the patches.



v1-0001-Add-jsonpath-pg-modifier-for-enabling-extensions.patch
v1-0002-Add-raw-jbvArray-and-jbvObject-support-to-jsonpat.patch
v1-0003-Add-jsonpath-sequence-constructors.patch
v1-0004-Add-jsonpath-array-constructors.patch
v1-0005-Add-jsonpath-object-constructors.patch
v1-0006-Add-jsonpath-object-subscripting.patch


I can't get these to apply against master.


$ patch --dry-run -b -l -F 15 -p 1 < 
0001-Jsonpath-support-for-json-v42.patch

can't find file to patch at input line 38

(tried  -p o and -p 1 -- am I doing it wrong?)


Maybe it needs something else applied first?


Erik Rijkers

Symbolic names for the values of typalign and typstorage

2020-03-02 Thread Tom Lane

While looking at Tomas' ALTER TYPE patch, I got annoyed by the fact
that all of the backend writes constants of type alignment and type
storage values as literal characters, such as 'i' and 'x'.  This is
not our style for most other "poor man's enum" catalog columns, and
it makes it really hard to grep for relevant code.  Hence, attached
is a proposed patch to invent #define names for those values.

As is our custom for other similar catalog columns, I only used the
macros in C code.  There are some references in SQL code too,
particularly in the regression tests, but the difficulty of replacing
symbolic references in SQL code seems more than it's worth to fix.

One thing that I'm not totally happy about, as this stands, is that
we have to #include "catalog/pg_type.h" in various places we did
not need to before (although only a fraction of the files I touched
need that).  Part of the issue is that I used the TYPALIGN_XXX
macros in tupmacs.h, but did not #include pg_type.h there because
I was concerned about macro inclusion bloat.  Plausible alternatives
to the way I did it here include

* just bite the bullet and #include pg_type.h in tupmacs.h;

* keep using the hard-coded values in tupmacs.h (with a comment
as to why);

* put the TYPALIGN_XXX #defines somewhere else (not clear where,
but there might be a case for postgres.h, since so much of the
backend has some interest in alignment).

Thoughts?  Anybody want to say that this is more code churn
than it's worth?

regards, tom lane

diff --git a/contrib/hstore/hstore_gin.c b/contrib/hstore/hstore_gin.c
index 4c3a422..9085302 100644
--- a/contrib/hstore/hstore_gin.c
+++ b/contrib/hstore/hstore_gin.c
@@ -119,7 +119,7 @@ gin_extract_hstore_query(PG_FUNCTION_ARGS)
 		text	   *item;
 
 		deconstruct_array(query,
-		  TEXTOID, -1, false, 'i',
+		  TEXTOID, -1, false, TYPALIGN_INT,
 		  _datums, _nulls, _count);
 
 		entries = (Datum *) palloc(sizeof(Datum) * key_count);
diff --git a/contrib/hstore/hstore_gist.c b/contrib/hstore/hstore_gist.c
index e860f1e..d198c4b 100644
--- a/contrib/hstore/hstore_gist.c
+++ b/contrib/hstore/hstore_gist.c
@@ -555,7 +555,7 @@ ghstore_consistent(PG_FUNCTION_ARGS)
 		int			i;
 
 		deconstruct_array(query,
-		  TEXTOID, -1, false, 'i',
+		  TEXTOID, -1, false, TYPALIGN_INT,
 		  _datums, _nulls, _count);
 
 		for (i = 0; res && i < key_count; ++i)
@@ -578,7 +578,7 @@ ghstore_consistent(PG_FUNCTION_ARGS)
 		int			i;
 
 		deconstruct_array(query,
-		  TEXTOID, -1, false, 'i',
+		  TEXTOID, -1, false, TYPALIGN_INT,
 		  _datums, _nulls, _count);
 
 		res = false;
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index f3174f2..60bdbea 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -560,7 +560,7 @@ hstore_from_arrays(PG_FUNCTION_ARGS)
  errmsg("wrong number of array subscripts")));
 
 	deconstruct_array(key_array,
-	  TEXTOID, -1, false, 'i',
+	  TEXTOID, -1, false, TYPALIGN_INT,
 	  _datums, _nulls, _count);
 
 	/* see discussion in hstoreArrayToPairs() */
@@ -599,7 +599,7 @@ hstore_from_arrays(PG_FUNCTION_ARGS)
 	 errmsg("arrays must have same bounds")));
 
 		deconstruct_array(value_array,
-		  TEXTOID, -1, false, 'i',
+		  TEXTOID, -1, false, TYPALIGN_INT,
 		  _datums, _nulls, _count);
 
 		Assert(key_count == value_count);
@@ -689,7 +689,7 @@ hstore_from_array(PG_FUNCTION_ARGS)
 	}
 
 	deconstruct_array(in_array,
-	  TEXTOID, -1, false, 'i',
+	  TEXTOID, -1, false, TYPALIGN_INT,
 	  _datums, _nulls, _count);
 
 	count = in_count / 2;
diff --git a/contrib/hstore/hstore_op.c b/contrib/hstore/hstore_op.c
index fb1bb06..dd79d01 100644
--- a/contrib/hstore/hstore_op.c
+++ b/contrib/hstore/hstore_op.c
@@ -81,7 +81,7 @@ hstoreArrayToPairs(ArrayType *a, int *npairs)
 j;
 
 	deconstruct_array(a,
-	  TEXTOID, -1, false, 'i',
+	  TEXTOID, -1, false, TYPALIGN_INT,
 	  _datums, _nulls, _count);
 
 	if (key_count == 0)
@@ -583,7 +583,7 @@ hstore_slice_to_array(PG_FUNCTION_ARGS)
 	int			i;
 
 	deconstruct_array(key_array,
-	  TEXTOID, -1, false, 'i',
+	  TEXTOID, -1, false, TYPALIGN_INT,
 	  _datums, _nulls, _count);
 
 	if (key_count == 0)
@@ -623,7 +623,7 @@ hstore_slice_to_array(PG_FUNCTION_ARGS)
 			  ARR_NDIM(key_array),
 			  ARR_DIMS(key_array),
 			  ARR_LBOUND(key_array),
-			  TEXTOID, -1, false, 'i');
+			  TEXTOID, -1, false, TYPALIGN_INT);
 
 	PG_RETURN_POINTER(aout);
 }
@@ -720,7 +720,7 @@ hstore_akeys(PG_FUNCTION_ARGS)
 	}
 
 	a = construct_array(d, count,
-		TEXTOID, -1, false, 'i');
+		TEXTOID, -1, false, TYPALIGN_INT);
 
 	PG_RETURN_POINTER(a);
 }
@@ -767,7 +767,7 @@ hstore_avals(PG_FUNCTION_ARGS)
 	}
 
 	a = construct_md_array(d, nulls, 1, , ,
-		   TEXTOID, -1, false, 'i');
+		   TEXTOID, -1, false, TYPALIGN_INT);
 
 	PG_RETURN_POINTER(a);
 }
@@ -819,7 +819,7 @@ hstore_to_array_internal(HStore *hs, int ndims)

Re: Portal->commandTag as an enum

2020-03-02 Thread Mark Dilger




> On Mar 2, 2020, at 1:57 PM, Alvaro Herrera  wrote:
> 
> On 2020-Mar-02, Mark Dilger wrote:
> 
>> I think it is more natural to change event trigger code to reason
>> natively about CommandTags rather than continuing to reason about
>> strings.  The conversion back-and-forth between the enum and the
>> string representation serves no useful purpose that I can see.  But I
>> understand if you are just trying to have the patch change fewer parts
>> of the code, and if you feel more comfortable about reverting that
>> part of the patch, as the committer, I think that's your prerogative.
> 
> Nah -- after reading it again, that would make no sense.  With the
> change, we're forcing all writers of event trigger functions in C to
> adapt their functions to the new API, but that's okay -- I don't expect
> there will be many, and we're doing other things to the API anyway.
> 
> I pushed it now.

Thanks!  I greatly appreciate your time.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allowing ALTER TYPE to change storage strategy

2020-03-02 Thread Tomas Vondra

On Mon, Mar 02, 2020 at 02:11:10PM -0500, Tom Lane wrote:

I started to look at this patch with fresh eyes after reading the patch
for adding binary I/O for ltree,

https://www.postgresql.org/message-id/flat/CANmj9Vxx50jOo1L7iSRxd142NyTz6Bdcgg7u9P3Z8o0=hgk...@mail.gmail.com

and realizing that the only reasonable way to tackle that problem is to
provide an ALTER TYPE command that can set the binary I/O functions for
an existing type. (One might think that it'd be acceptable to UPDATE
the pg_type row directly; but that wouldn't take care of dependencies
properly, and it also wouldn't handle the domain issues I discuss below.)
There are other properties too that can be set in CREATE TYPE but we
have no convenient way to adjust later, though it'd be reasonable to
want to do so.

I do not think that we want to invent bespoke syntax for each property.
The more such stuff we cram into ALTER TYPE, the bigger the risk of
conflicting with some future SQL extension. Moreover, since CREATE TYPE
for base types already uses the "( keyword = value, ... )" syntax for
these properties, and we have a similar precedent in CREATE/ALTER
OPERATOR, it seems to me that the syntax we want here is

ALTER TYPE typename SET ( keyword = value [, ... ] )

Agreed, it seems reasonable to use the ALTER OPRATOR precedent.

Attached is a stab at doing it that way, and implementing setting of
the binary I/O functions for good measure. (It'd be reasonable to
add more stuff, like setting the other support functions, but this
is enough for the immediate discussion.)

The main thing I'm not too happy about is what to do about domains.
Your v2 patch supposed that it's okay to allow ALTER TYPE on domains,
but I'm not sure we want to go that way, and if we do there's certainly
a bunch more work that has to be done. Up to now the system has
supposed that domains inherit all these properties from their base
types. I'm not certain exactly how far that assumption has propagated,
but there's at least one place that implicitly assumes it: pg_dump has
no logic for adjusting a domain to have different storage or support
functions than the base type had. So as v2 stands, a custom storage
option on a domain would be lost in dump/reload.

Another issue that would become a big problem if we allow domains to
have custom I/O functions is that the wire protocol transmits the
base type's OID, not the domain's OID, for an output column that
is of a domain type. A client that expected a particular output
format on the strength of what it was told the column type was
would be in for a big surprise.

Certainly we could fix pg_dump if we had a mind to, but changing
the wire protocol for this would have unpleasant ramifications.
And I'm worried about whether there are other places in the system
that are also making this sort of assumption.

I'm also not very convinced that we *want* to allow domains to vary from
their base types in this way. The primary use-case I can think of for
ALTER TYPE SET is in extension update scripts, and an extension would
almost surely wish for any domains over its type to automatically absorb
whatever changes of this sort it wants to make.

So I think there are two distinct paths we could take here:

* Decide that it's okay to allow domains to vary from their base type
in these properties. Teach pg_dump to cope with that, and stand ready
to fix any other bugs we find, and have some story to tell the people
whose clients we break. Possibly add a CASCADE option to
ALTER TYPE SET, with the semantics of adjusting dependent domains
to match. (This is slightly less scary than the CASCADE semantics
you had in mind, because it would only affect pg_type entries not
tables.)

* Decide that we'll continue to require domains to match their base
type in all these properties. That means refusing to allow ALTER
on a domain per se, and automatically cascading these changes to
dependent domains.

In the v3 patch below, I've ripped out the ALTER DOMAIN syntax on
the assumption that we'd do the latter; but I've not written the
cascade recursion logic, because that seemed like a lot of work
to do in advance of having consensus on it being a good idea.

I do agree we should do the latter, i.e. maintain the assumption that
domains have the same properties as their base type. I can't think of a
use case for allowing them to differ, it just didn't occur to me there
is this implicit assumption when writing the patch.

I've also restricted the code to work just on base types, because
it's far from apparent to me that it makes any sense to allow any
of these operations on derived types such as composites or ranges.
Again, there's a fair amount of code that is not going to be
prepared for such a type to have properties that it could not
have at creation, and I don't see a use-case that really justifies
breaking those expectations.

Yeah, that makes sense too, I think.

regards

--
Tomas Vondra http://www.2ndQuadrant.com

Re: Portal->commandTag as an enum

On 2020-Mar-02, Mark Dilger wrote:

> I think it is more natural to change event trigger code to reason
> natively about CommandTags rather than continuing to reason about
> strings.  The conversion back-and-forth between the enum and the
> string representation serves no useful purpose that I can see.  But I
> understand if you are just trying to have the patch change fewer parts
> of the code, and if you feel more comfortable about reverting that
> part of the patch, as the committer, I think that's your prerogative.

Nah -- after reading it again, that would make no sense.  With the
change, we're forcing all writers of event trigger functions in C to
adapt their functions to the new API, but that's okay -- I don't expect
there will be many, and we're doing other things to the API anyway.

I pushed it now.

I made very small changes before pushing: notably, I removed the
InitializeQueryCompletion() call from standard_ProcessUtility; instead
its callers are supposed to do it.  Updated ProcessUtility's comment to
that effect.

Also, the affected plancache.c functions (CreateCachedPlan and
CreateOneShotCachedPlan) had not had their comments updated.  Previously
they received compile-time constants, but that was important because
they were strings.  No longer.

I noticed some other changes that could perhaps be made here, but didn't
do them; for instance, in pg_stat_statement we have a comparison to
CMDTAG_COPY that seems pointless; we could just acquire the value
always.  I left it alone for now but I think the change is without side
effects (notably so because most actual DML does not go through
ProcessUtility anyway).  Also, event_trigger.c could resolve command
strings to the tag enum earlier.

There's also a lot of nonsense in the pquery.c functions, such as this,

/*
 * Now fetch desired portion of results.
 */
nprocessed = PortalRunSelect(portal, true, 
count, dest);

/*
 * If the portal result contains a command tag 
and the caller
 * gave us a pointer to store it, copy it and 
update the
 * rowcount.
 */
if (qc && portal->qc.commandTag != 
CMDTAG_UNKNOWN)
{
CopyQueryCompletion(qc, >qc);
qc->nprocessed = nprocessed;
}

I think we could simplify that by passing the qc.

Similar consideration with DoCopy; instead of a 'uint64 nprocessed' we
could have a *qc to fill in and avoid this bit of silliness,

DoCopy(pstate, (CopyStmt *) parsetree,
   pstmt->stmt_location, 
pstmt->stmt_len,
   );
if (qc)
SetQueryCompletion(qc, CMDTAG_COPY, 
processed);

I see no reason to have PortalRun() initialize the qc; ISTM that its
callers should do so.

And so on.

Nothing of that is critical.

Thanks for your patch,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] kNN for btree

2020-03-02 Thread Alexander Korotkov

On Tue, Dec 3, 2019 at 3:00 AM Alexander Korotkov
 wrote:
> On Mon, Sep 9, 2019 at 11:28 PM Alexander Korotkov
>  wrote:
> > On Sun, Sep 8, 2019 at 11:35 PM Alexander Korotkov
> >  wrote:
> > > I'm going to push 0001 changing "attno >= 1" to assert.
> >
> > Pushed.  Rebased patchset is attached.  I propose to limit
> > consideration during this commitfest to this set of 7 remaining
> > patches.  The rest of patches could be considered later.  I made some
> > minor editing in preparation to commit.  But I decided I've couple
> > more notes to Nikita.
> >
> >  * 0003 extracts part of fields from BTScanOpaqueData struct into new
> > BTScanStateData struct.  However, there is a big comment regarding
> > BTScanOpaqueData just before definition of BTScanPosItem.  It needs to
> > be revised.
> >  * 0004 adds "knnState" field to BTScanOpaqueData in addition to
> > "state" field.  Wherein "knnState" might unused during knn scan if it
> > could be done in one direction.  This seems counter-intuitive.  Could
> > we rework this to have "forwardState" and "backwardState" fields
> > instead of "state" and "knnState"?
>
> I have reordered patchset into fewer more self-consistent patches.
>
> Besides comments, code beautification and other improvements, now
> there are dedicated fields for forward and backward scan states.  The
> forward scan state is the pointer to data structure, which is used in
> ordinal unidirectional scan.  So, it's mostly cosmetic change, but it
> improves the code readability.
>
> One thing bothers me.  We have to move scalar distance operators from
> btree_gist to core.  btree_gist extension upgrade script have to
> qualify operators with schema in order to distinguish core and
> extension implementations.  So, I have to use @extschema@.  But it's
> forbidden for relocatable extensions.  For now, I marken btree_gist as
> non-relocatable.  Our comment in extension.c says "For a relocatable
> extension, we needn't do this.  There cannot be any need for
> @extschema@, else it wouldn't be relocatable.".  Is it really true?  I
> think now we have pretty valid example for relocatable extension,
> which needs @extschema@ in upgrade script.  Any thoughts?

I've rebased the patchset to the current master and made some
refactoring.  I hope it would be possible to bring it to committable
shape during this CF.  This need more refactoring though.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Introduce-IndexAmRoutine.ammorderbyopfirstcol-v17.patch.gz
Description: GNU Zip compressed data


0002-Allow-ordering-by-operator-in-ordered-indexes-v17.patch.gz
Description: GNU Zip compressed data


0004-Move-scalar-distance-operators-from-btree_gist-t-v17.patch.gz
Description: GNU Zip compressed data


0003-Extract-BTScanState-from-BTScanOpaqueData-v17.patch.gz
Description: GNU Zip compressed data


0005-Add-knn-support-to-btree-indexes-v17.patch.gz
Description: GNU Zip compressed data

Re: Use compiler intrinsics for bit ops in hash

2020-03-02 Thread Jesse Zhang

Hi David,

On Wed, Feb 26, 2020 at 9:56 PM David Fetter  wrote:
>
> On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote:
> > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote:
> > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > > > > improvement. My biggest cringe might be in pg_bitutils:
> > > > > >
> > > > > > 1. Is ceil_log2_64 dead code?
> > > > >
> > > > > Let's call it nascent code. I suspect there are places it could go, if
> > > > > I look for them.  Also, it seemed silly to have one without the other.
> > > > >
> > > >
> > > > While not absolutely required, I'd like us to find at least one
> > > > place and start using it. (Clang also nags at me when we have
> > > > unused functions).
> > >
> > > Done in the expanded patches attached.
I see that you've found use of it in dynahash, thanks!

The math in the new (from v4 to v6) patch is wrong: it yields
ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted
the restriction of "num greater than one" for ceil_log2() in this patch
set, but it's now _more_ problematic to base those functions on
pg_leftmost_one_pos().

I'm not comfortable with your changes to pg_leftmost_one_pos() to remove
the restriction on word being non-zero. Specifically
pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its
current callers (in HEAD) is harmed, this introduces muddy semantics:

1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning
for a set bit in a zero word won't find it anywhere.

2. we can _try_ generalizing it to accommodate ceil_log2 by
extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In
that case, the extrapolation yields -1 for pg_leftmost_one_pos(0).

I'm not convinced that others on the list will be comfortable with the
generalization suggested in 2 above.

I've quickly put together a PoC patch on top of yours, which
re-implements ceil_log2 using LZCNT coupled with a CPUID check.
Thoughts?

Cheers,
Jesse
From 0e4392d77b6132a508b7da14871cc99066a9d114 Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Fri, 28 Feb 2020 16:22:04 -0800
Subject: [PATCH] Use LZCOUNT when possible

This patch reverts the changes to pg_leftmost_one and friends (which is
really bit-scan-reverse, and is legitimately undefined one zero) and
reworks ceil_log2 and friends to use a count-leading-zeros operation
that is well-defined on zero.
---
 src/include/port/pg_bitutils.h | 47 -
 src/port/pg_bitutils.c | 77 ++
 2 files changed, 104 insertions(+), 20 deletions(-)

diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 88a9ea5b7fb..b4d5724ee1d 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -20,18 +20,19 @@ extern PGDLLIMPORT const uint8 pg_number_of_ones[256];
 /*
  * pg_leftmost_one_pos32
  *		Returns the position of the most significant set bit in "word",
- *		measured from the least significant bit.
+ *		measured from the least significant bit.  word must not be 0.
  */
 static inline int
 pg_leftmost_one_pos32(uint32 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
-	return word ? 31 - __builtin_clz(word) : 0;
+	Assert(word != 0);
+
+	return 31 - __builtin_clz(word);
 #else
 	int			shift = 32 - 8;
 
-	if (word == 0)
-		return 0;
+	Assert(word != 0);
 
 	while ((word >> shift) == 0)
 		shift -= 8;
@@ -48,18 +49,19 @@ static inline int
 pg_leftmost_one_pos64(uint64 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
+	Assert(word != 0);
+
 #if defined(HAVE_LONG_INT_64)
-	return word ? 63 - __builtin_clzl(word) : 0;
+	return 63 - __builtin_clzl(word);
 #elif defined(HAVE_LONG_LONG_INT_64)
-	return word ? 63 - __builtin_clzll(word) : 0;
+	return 63 - __builtin_clzll(word);
 #else
 #error must have a working 64-bit integer datatype
 #endif
 #else			/* !HAVE__BUILTIN_CLZ */
 	int			shift = 64 - 8;
 
-	if (word == 0)
-		return 0;
+	Assert(word != 0);
 
 	while ((word >> shift) == 0)
 		shift -= 8;
@@ -71,18 +73,19 @@ pg_leftmost_one_pos64(uint64 word)
 /*
  * pg_rightmost_one_pos32
  *		Returns the position of the least significant set bit in "word",
- *		measured from the least significant bit.
+ *		measured from the least significant bit.  word must not be 0.
  */
 static inline int
 pg_rightmost_one_pos32(uint32 word)
 {
 #ifdef HAVE__BUILTIN_CTZ
-	return word ? __builtin_ctz(word) : 32;
+	Assert(word != 0);
+
+	return __builtin_ctz(word);
 #else
 	int			result = 0;
 
-	if (word == 0)
-		return 32;
+	Assert(word != 0);
 
 	while ((word & 255) == 0)
 	{
@@ -102,18 +105,19 @@ static inline int
 pg_rightmost_one_pos64(uint64 word)
 {
 #ifdef HAVE__BUILTIN_CTZ
+	Assert(word != 0);
+
 #if defined(HAVE_LONG_INT_64)
-	return word ? __builtin_ctzl(word) : 64;
+	return __builtin_ctzl(word);
 #elif defined(HAVE_LONG_LONG_INT_64)
-	return word ?

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-02 Thread Andy Fan

On Tue, Mar 3, 2020 at 1:24 AM Andy Fan  wrote:

> Thank you Tom for the review!
>
> On Mon, Mar 2, 2020 at 4:46 AM Tom Lane  wrote:
>
>> Andy Fan  writes:
>> > Please see if you have any comments.   Thanks
>>
>> The cfbot isn't at all happy with this.  Its linux build is complaining
>> about a possibly-uninitialized variable, and then giving up:
>>
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/656722993
>>
>> The Windows build isn't using -Werror, but it is crashing in at least
>> two different spots in the regression tests:
>>
>>
>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81778
>>
>> I've not attempted to identify the cause of that.
>>
>>
> Before I submit the patch, I can make sure "make check-world" is
> successful, but
> since the compile option is not same,  so I didn't catch
> the possibly-uninitialized
> variable.   As for the crash on the windows,  I didn't get the enough
> information
> now,  I will find a windows server and reproduce the cases.
>
> I just found the link http://commitfest.cputube.org/ this morning, I will
> make sure
> the next patch can go pass this test.
>
>
> At a high level, I'm a bit disturbed that this focuses only on DISTINCT
>> and doesn't (appear to) have any equivalent intelligence for GROUP BY,
>> though surely that offers much the same opportunities for optimization.
>> It seems like it'd be worthwhile to take a couple steps back and see
>> if we couldn't recast the logic to work for both.
>>
>>
> OK,  Looks group by is a bit harder  than distinct since the aggregation
> function.
> I will go through the code to see why to add this logic.
>
>

Can we grantee  any_aggr_func(a) == a  if only 1 row returned,  if so, we
can do
some work on the pathtarget/reltarget by transforming the Aggref to raw
expr.
I checked the execution path of the aggregation call, looks it depends on
Agg node
which is the thing we want to remove.

>
>> * There seem to be some pointless #include additions, eg in planner.c
>> the added code doesn't look to justify any of them.  Please also
>> avoid unnecessary whitespace changes, those also slow down reviewing.
>>
>
fixed some typo errors.

That may be because I added the header file at time 1 and then refactored
the code but forget to remove the header file when it is not necessary.
Do we need to relay on experience to tell if the header file is needed or
not,
or do we have any tool to tell it automatically?


 regards,  Andy Fan

Re: Allowing ALTER TYPE to change storage strategy

2020-03-02 Thread Tom Lane

I started to look at this patch with fresh eyes after reading the patch
for adding binary I/O for ltree,

https://www.postgresql.org/message-id/flat/CANmj9Vxx50jOo1L7iSRxd142NyTz6Bdcgg7u9P3Z8o0=hgk...@mail.gmail.com

and realizing that the only reasonable way to tackle that problem is to
provide an ALTER TYPE command that can set the binary I/O functions for
an existing type.  (One might think that it'd be acceptable to UPDATE
the pg_type row directly; but that wouldn't take care of dependencies
properly, and it also wouldn't handle the domain issues I discuss below.)
There are other properties too that can be set in CREATE TYPE but we
have no convenient way to adjust later, though it'd be reasonable to
want to do so.

I do not think that we want to invent bespoke syntax for each property.
The more such stuff we cram into ALTER TYPE, the bigger the risk of
conflicting with some future SQL extension.  Moreover, since CREATE TYPE
for base types already uses the "( keyword = value, ... )" syntax for
these properties, and we have a similar precedent in CREATE/ALTER
OPERATOR, it seems to me that the syntax we want here is

ALTER TYPE typename SET ( keyword = value [, ... ] )

Attached is a stab at doing it that way, and implementing setting of
the binary I/O functions for good measure.  (It'd be reasonable to
add more stuff, like setting the other support functions, but this
is enough for the immediate discussion.)

The main thing I'm not too happy about is what to do about domains.
Your v2 patch supposed that it's okay to allow ALTER TYPE on domains,
but I'm not sure we want to go that way, and if we do there's certainly
a bunch more work that has to be done.  Up to now the system has
supposed that domains inherit all these properties from their base
types.  I'm not certain exactly how far that assumption has propagated,
but there's at least one place that implicitly assumes it: pg_dump has
no logic for adjusting a domain to have different storage or support
functions than the base type had.  So as v2 stands, a custom storage
option on a domain would be lost in dump/reload.

Another issue that would become a big problem if we allow domains to
have custom I/O functions is that the wire protocol transmits the
base type's OID, not the domain's OID, for an output column that
is of a domain type.  A client that expected a particular output
format on the strength of what it was told the column type was
would be in for a big surprise.

Certainly we could fix pg_dump if we had a mind to, but changing
the wire protocol for this would have unpleasant ramifications.
And I'm worried about whether there are other places in the system
that are also making this sort of assumption.

I'm also not very convinced that we *want* to allow domains to vary from
their base types in this way.  The primary use-case I can think of for
ALTER TYPE SET is in extension update scripts, and an extension would
almost surely wish for any domains over its type to automatically absorb
whatever changes of this sort it wants to make.

So I think there are two distinct paths we could take here:

* Decide that it's okay to allow domains to vary from their base type
in these properties.  Teach pg_dump to cope with that, and stand ready
to fix any other bugs we find, and have some story to tell the people
whose clients we break.  Possibly add a CASCADE option to
ALTER TYPE SET, with the semantics of adjusting dependent domains
to match.  (This is slightly less scary than the CASCADE semantics
you had in mind, because it would only affect pg_type entries not
tables.)

* Decide that we'll continue to require domains to match their base
type in all these properties.  That means refusing to allow ALTER
on a domain per se, and automatically cascading these changes to
dependent domains.

In the v3 patch below, I've ripped out the ALTER DOMAIN syntax on
the assumption that we'd do the latter; but I've not written the
cascade recursion logic, because that seemed like a lot of work
to do in advance of having consensus on it being a good idea.

I've also restricted the code to work just on base types, because
it's far from apparent to me that it makes any sense to allow any
of these operations on derived types such as composites or ranges.
Again, there's a fair amount of code that is not going to be
prepared for such a type to have properties that it could not
have at creation, and I don't see a use-case that really justifies
breaking those expectations.

Thoughts?

regards, tom lane

diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 67be1dd..3465d3f 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -30,6 +30,7 @@ ALTER TYPE name RENAME TO name SET SCHEMA new_schema
 ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } neighbor_enum_value ]
 ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value
+ALTER TYPE name SET (

[PATCH] Documentation bug related to client authentication using TLS certificate

2020-03-02 Thread Cary Huang

Hi



I found a document bug about client authentication using TLS certificate. When 
clientcert authentication is enabled in pg_hba.conf, libpq does not verify that 
the common name in certificate matches database username like it is described 
in the documentation before allowing client connection.

Instead, when sslmode is set to “verify-full”, libpq will verify if the server 
host name matches the common name in client certificate. When sslmode is set to 
“verify-ca”, libpq will verify that the client is trustworthy by checking the 
certificate trust chain up to the root certificate and it does not verify 
server hostname and certificate common name match in this case.



The attached patch corrects the clientcert authentication description in the 
documentation



cheers













Cary Huang

-

HighGo Software Inc. (Canada)

mailto:cary.hu...@highgo.ca

http://www.highgo.ca

client_cert_auth.patch
Description: Binary data

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-03-02 Thread Alexey Kondratov


On 2020-03-02 07:53, Michael Paquier wrote:




+ * For fixed-size files, the caller may pass the expected size as an
+ * additional crosscheck on successful recovery.  If the file size is 
not

+ * known, set expectedSize = 0.
+ */
+int
+RestoreArchivedWALFile(const char *path, const char *xlogfname,
+  off_t expectedSize, const char 
*restoreCommand)


Actually, expectedSize is IMO a bad idea, because any caller of this
routine passing down zero could be trapped with an incorrect file
size.  So let's remove the behavior where it is possible to bypass
this sanity check.  We don't need it in pg_rewind either.



OK, sounds reasonable, but just to be clear. I will remove only a 
possibility to bypass this sanity check (with 0), but leave expectedSize 
argument intact. We still need it, since pg_rewind takes WalSegSz from 
ControlFile and should pass it further, am I right?





+   /* Remove trailing newline */
+   if (strchr(cmd_output, '\n') != NULL)
+   *strchr(cmd_output, '\n') = '\0';


It seems to me that what you are looking here is to use
pg_strip_crlf().  Thinking harder, we have pipe_read_line() in
src/common/exec.c which does the exact same job..



pg_strip_crlf fits well, but would you mind if I also make 
pipe_read_line external in this patch?





-   /*
-* construct the command to be executed
-*/


Perhaps you meant "build" here.



Actually, the verb 'construct' is used historically applied to 
archive/restore commands (see also xlogarchive.c and pgarch.c), but it 
should be 'build' in (fe_)archive.c, since we have BuildRestoreCommand 
there now.


All other remarks look clear for me, so I fix them in the next patch 
version, thanks.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-02 Thread Andy Fan

Thank you Tom for the review!

On Mon, Mar 2, 2020 at 4:46 AM Tom Lane  wrote:

> Andy Fan  writes:
> > Please see if you have any comments.   Thanks
>
> The cfbot isn't at all happy with this.  Its linux build is complaining
> about a possibly-uninitialized variable, and then giving up:
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/656722993
>
> The Windows build isn't using -Werror, but it is crashing in at least
> two different spots in the regression tests:
>
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81778
>
> I've not attempted to identify the cause of that.
>
>
Before I submit the patch, I can make sure "make check-world" is
successful, but
since the compile option is not same,  so I didn't catch
the possibly-uninitialized
variable.   As for the crash on the windows,  I didn't get the enough
information
now,  I will find a windows server and reproduce the cases.

I just found the link http://commitfest.cputube.org/ this morning, I will
make sure
the next patch can go pass this test.


At a high level, I'm a bit disturbed that this focuses only on DISTINCT
> and doesn't (appear to) have any equivalent intelligence for GROUP BY,
> though surely that offers much the same opportunities for optimization.
> It seems like it'd be worthwhile to take a couple steps back and see
> if we couldn't recast the logic to work for both.
>
>
OK,  Looks group by is a bit harder  than distinct since the aggregation
function.
I will go through the code to see why to add this logic.


> Some other random comments:
>
> * Don't especially like the way you broke query_is_distinct_for()
> into two functions, especially when you then introduced a whole
> lot of other code in between.


This is not expected by me until you point it out.  In this case, I have to
break the query_is_distinct_for to two functions, but it true that we
should put the two functions together.


That's just making reviewer's job
> harder to see what changed.  It makes the comments a bit disjointed
> too, that is where you even had any.  (Zero introductory comment
> for query_is_distinct_agg is *not* up to project coding standards.
> There are a lot of other undercommented places in this patch, too.)
>
> * Definitely don't like having query_distinct_through_join re-open
> all the relations.  The data needed for this should get absorbed
> while plancat.c has the relations open at the beginning.  (Non-nullness
> of columns, in particular, seems like it'll be useful for other
> purposes; I'm a bit surprised the planner isn't using that already.)
>

I can add new attributes to RelOptInfo and fill the value in
get_relation_info
call.


> * In general, query_distinct_through_join seems hugely messy, expensive,
> and not clearly correct.  If it is correct, the existing comments sure
> aren't enough to explain what it is doing or why.


>
Removing the relation_open call can make it a bit simpler,  I will try more
comment to make it clearer in the following patch.


> * There seem to be some pointless #include additions, eg in planner.c
> the added code doesn't look to justify any of them.  Please also
> avoid unnecessary whitespace changes, those also slow down reviewing.
>
>
That may because I added the header file some time 1 and then refactored
the code later then forget the remove the header file accordingly.  Do we
need
to relay on experience to tell if the header file is needed or not, or do
have have
any code to tell it automatically?


> * I see you decided to add a new regression test file select_distinct_2.
> That's a poor choice of name because it conflicts with our rules for the
> naming of alternative output files.  Besides which, you forgot to plug
> it into the test schedule files, so it isn't actually getting run.
> Is there a reason not to just add the new test cases to select_distinct?
>
>
Adding it to select_distinct.sql is ok for me as well.  Actually I have no
obviously reason to add the new file.


> * There are some changes in existing regression cases that aren't
> visibly related to the stated purpose of the patch, eg it now
> notices that "select distinct max(unique2) from tenk1" doesn't
> require an explicit DISTINCT step.  That's not wrong, but I wonder
> if maybe you should subdivide this patch into more than one patch,
> because that must be coming from some separate change.  I'm also
> wondering what caused the plan change in expected/join.out.
>

Per my purpose it should be in the same patch,  the logical here is we
have distinct in the sql and the query is distinct already since the max
function (the rule is defined in query_is_distinct_agg which is splited
from
the original query_is_distinct_for clause).


> regards, tom lane
>

Re: Portal->commandTag as an enum

2020-03-02 Thread Mark Dilger

> On Mar 2, 2020, at 9:08 AM, Alvaro Herrera  wrote:
> 
> On 2020-Mar-02, Alvaro Herrera wrote:
> 
>> Here's the patch I propose for commit.  I also rewrote the commit
>> message.
> 
> BTW I wonder if we should really change the definition of
> EventTriggerData.  ISTM that it would be sensible to keep it the same
> for now ...

I think it is more natural to change event trigger code to reason natively 
about CommandTags rather than continuing to reason about strings.  The 
conversion back-and-forth between the enum and the string representation serves 
no useful purpose that I can see.  But I understand if you are just trying to 
have the patch change fewer parts of the code, and if you feel more comfortable 
about reverting that part of the patch, as the committer, I think that's your 
prerogative.

Did you want to do that yourself, or have me do it and resubmit?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portal->commandTag as an enum

On 2020-Mar-02, Alvaro Herrera wrote:

> Here's the patch I propose for commit.  I also rewrote the commit
> message.

BTW I wonder if we should really change the definition of
EventTriggerData.  ISTM that it would be sensible to keep it the same
for now ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Portal->commandTag as an enum

Here's the patch I propose for commit.  I also rewrote the commit
message.

There are further refinements that can be done, but they don't need to
be in the first patch.  Notably, the event trigger code can surely do a
lot better now by translating the tag list to a bitmapset earlier.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5e5605ec913fb4ea89665debc57d430bbda34b8f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 2 Mar 2020 13:39:16 -0300
Subject: [PATCH v7] Migrating commandTag from string to enum.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The backend was using strings to represent command tags and doing string
comparisons in multiple places, but that's slow and unhelpful.  Create a
new command list with a supporting structure to use instead; this is
stored in a tag-list-file that can be tailored to specific purposes with
a caller-defineable C macro, similar to what we do for WAL resource
managers.  The first first such uses are a new CommandTag enum and a
CommandTagBehavior struct.

Replace numerous occurrences of char *completionTag with a
QueryCompletion struct so that the code no longer stores information
about completed queries in a cstring.  Only at the last moment, in
EndCommand(), does this get converted to a string.

EventTriggerCacheItem no longer holds an array of palloc’d tag strings
in sorted order, but rather just a Bitmapset over the CommandTags.

Author: Mark Dilger
Reviewed-by: John Naylor, Tom Lane, Álvaro Herrera
Discussion: https://postgr.es/m/981a9db4-3f0c-4da5-88ad-cb9cff4d6...@enterprisedb.com
---
 .../pg_stat_statements/pg_stat_statements.c   |  18 +-
 contrib/sepgsql/hooks.c   |   6 +-
 doc/src/sgml/event-trigger.sgml   |   2 +-
 src/backend/commands/createas.c   |  14 +-
 src/backend/commands/event_trigger.c  | 160 +
 src/backend/commands/matview.c|   2 +-
 src/backend/commands/portalcmds.c |  16 +-
 src/backend/commands/prepare.c|   4 +-
 src/backend/executor/execMain.c   |   4 +-
 src/backend/executor/functions.c  |   4 +-
 src/backend/executor/spi.c|  22 +-
 src/backend/replication/logical/decode.c  |   2 +-
 src/backend/replication/walsender.c   |  18 +-
 src/backend/tcop/Makefile |   1 +
 src/backend/tcop/cmdtag.c |  98 +++
 src/backend/tcop/dest.c   |  32 +-
 src/backend/tcop/postgres.c   |  24 +-
 src/backend/tcop/pquery.c | 112 ++--
 src/backend/tcop/utility.c| 560 +-
 src/backend/utils/cache/evtcache.c|  30 +-
 src/backend/utils/cache/plancache.c   |   4 +-
 src/backend/utils/mmgr/portalmem.c|   6 +-
 src/include/commands/createas.h   |   3 +-
 src/include/commands/event_trigger.h  |   3 +-
 src/include/commands/matview.h|   2 +-
 src/include/commands/portalcmds.h |   2 +-
 src/include/commands/prepare.h|   2 +-
 src/include/tcop/cmdtag.h |  56 ++
 src/include/tcop/cmdtaglist.h | 218 +++
 src/include/tcop/dest.h   |   6 +-
 src/include/tcop/pquery.h |   2 +-
 src/include/tcop/utility.h|  15 +-
 src/include/utils/evtcache.h  |   3 +-
 src/include/utils/plancache.h |   7 +-
 src/include/utils/portal.h|   6 +-
 src/pl/plperl/plperl.c|   2 +-
 src/pl/plpgsql/src/pl_exec.c  |  10 +-
 src/pl/tcl/pltcl.c|   3 +-
 .../test_ddl_deparse/test_ddl_deparse.c   |   2 +-
 39 files changed, 873 insertions(+), 608 deletions(-)
 create mode 100644 src/backend/tcop/cmdtag.c
 create mode 100644 src/include/tcop/cmdtag.h
 create mode 100644 src/include/tcop/cmdtaglist.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index e4fda4b404..7b8e690c95 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -307,7 +307,7 @@ static void pgss_ExecutorEnd(QueryDesc *queryDesc);
 static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
-DestReceiver *dest, char *completionTag);
+DestReceiver *dest, QueryCompletion *qc);
 static uint64 pgss_hash_string(const char *str, int len);
 static void pgss_store(const char *query, uint64 queryId,
 	   int query_location, int query_len,
@@ -960,7 +960,7 @@ static void
 pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,

Re: Portal->commandTag as an enum

On 2020-Mar-02, Mark Dilger wrote:

> > I don't
> > understand your point of pg_stats_sql having to deal with this in a
> > particular way.  How is that patch obtaining the command tags?  I would
> > hope it calls GetCommandTagName() rather than call CommandEnd, but maybe
> > I misunderstand where it hooks.
> 
> My objection was based on my misunderstanding of what Tom was requesting.
> 
> I can rework the patch the way Tom suggests.

I already did it :-)  Posting in a jiffy

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Portal->commandTag as an enum

2020-03-02 Thread Mark Dilger




> On Mar 2, 2020, at 8:12 AM, Alvaro Herrera  wrote:
> 
> On 2020-Feb-29, Mark Dilger wrote:
> 
>> You may want to think about the embedding of InvalidOid into the EndCommand 
>> output differently from how you think about the embedding of the rowcount 
>> into the EndCommand output, but my preference is to treat these issues the 
>> same and make a strong distinction between the commandtag and the embedded 
>> oid and/or rowcount.  It's hard to say how many future features would be 
>> crippled by having the embedded InvalidOid in the commandtag, but as an 
>> example *right now* in the works, we have a feature to count how many 
>> commands of a given type have been executed.  It stands to reason that 
>> feature, whether accepted in its current form or refactored, would not want 
>> to show users a pg_stats table like this:
>> 
>>   cnt   command
>>      -
>>   5INSERT 0
>>   37  SELECT
>>  
>> What the heck is the zero doing after the INSERT?  That's the hardcoded 
>> InvalidOid that you are apparently arguing for.  You could get around that 
>> by having the pg_sql_stats patch have its own separate set of command tag 
>> strings, but why would we intentionally design that sort of duplication into 
>> the solution?
> 
> This is what I think Tom means to use in EndCommand:
> 
>if (command_tag_display_rowcount(tag) && !force_undecorated_output)
>snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
> tag == CMDTAG_INSERT ?
> "%s 0 " UINT64_FORMAT : "%s " UINT64_FORMAT,
> tagname, qc->nprocessed);
>else
>... no rowcount ...
> 
> The point is not to change the returned tag in any way -- just to make
> the method to arrive at it not involve the additional data column in the
> data file, instead hardcode the behavior in EndCommand.

Thanks, Álvaro, I think I get it now.  I thought Tom was arguing to have 
"INSERT 0" rather than "INSERT" be the commandtag.

> I don't
> understand your point of pg_stats_sql having to deal with this in a
> particular way.  How is that patch obtaining the command tags?  I would
> hope it calls GetCommandTagName() rather than call CommandEnd, but maybe
> I misunderstand where it hooks.

My objection was based on my misunderstanding of what Tom was requesting.

I can rework the patch the way Tom suggests.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portal->commandTag as an enum

On 2020-Feb-29, Mark Dilger wrote:

> You may want to think about the embedding of InvalidOid into the EndCommand 
> output differently from how you think about the embedding of the rowcount 
> into the EndCommand output, but my preference is to treat these issues the 
> same and make a strong distinction between the commandtag and the embedded 
> oid and/or rowcount.  It's hard to say how many future features would be 
> crippled by having the embedded InvalidOid in the commandtag, but as an 
> example *right now* in the works, we have a feature to count how many 
> commands of a given type have been executed.  It stands to reason that 
> feature, whether accepted in its current form or refactored, would not want 
> to show users a pg_stats table like this:
> 
>cnt   command
>   -
>5INSERT 0
>37  SELECT
>   
> What the heck is the zero doing after the INSERT?  That's the hardcoded 
> InvalidOid that you are apparently arguing for.  You could get around that by 
> having the pg_sql_stats patch have its own separate set of command tag 
> strings, but why would we intentionally design that sort of duplication into 
> the solution?

This is what I think Tom means to use in EndCommand:

if (command_tag_display_rowcount(tag) && !force_undecorated_output)
snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 tag == CMDTAG_INSERT ?
 "%s 0 " UINT64_FORMAT : "%s " UINT64_FORMAT,
 tagname, qc->nprocessed);
else
... no rowcount ...

The point is not to change the returned tag in any way -- just to make
the method to arrive at it not involve the additional data column in the
data file, instead hardcode the behavior in EndCommand.  I don't
understand your point of pg_stats_sql having to deal with this in a
particular way.  How is that patch obtaining the command tags?  I would
hope it calls GetCommandTagName() rather than call CommandEnd, but maybe
I misunderstand where it hooks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Replica sends an incorrect epoch in its hot standby feedback to the Master

On Fri, Feb 21, 2020 at 8:15 PM Palamadai, Eka  wrote:

Please, do not top post on this list.

Thanks a lot for the feedback. Please let me know if you have any further
> comments. Meanwhile, I have also added this patch to "Commitfest 2020-03"
> at https://commitfest.postgresql.org/27/2464.
>

Apparently, there are a couple of duplicate entries in the commitfest as:
https://commitfest.postgresql.org/27/2463/ and
https://commitfest.postgresql.org/27/2462/

Could close those as "withdrawn"?

Regards,

Juan José Santamaría Flecha

Re: Auxiliary Processes and MyAuxProc

2020-03-02 Thread Mike Palmiotto

On Sat, Feb 29, 2020 at 9:51 PM Mike Palmiotto
 wrote:
>
> On Sat, Nov 30, 2019 at 9:15 PM Michael Paquier  wrote:
> >
> > On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote:
> > > Color me unconvinced.
> >
> > The latest comments of the thread have not been addressed yet. so I am
> > marking the patch as returned with feedback.  If you think that's not
> > correct, please feel free to update the status of the patch.  If you
> > do so, please provide at the same time a rebased version of the patch,
> > as it is failing to apply on HEAD.
>
> I've finally had some time to update the patchset with an attempt at
> addressing Andres' concerns. Note that the current spin has a bug
> which does not allow syslogger to run properly. Additionally, it is
> squashed into one patch again.  I intend to split the patch out into
> separate functional patches and fix the syslogger issue, but wanted to
> get this on the ticket for the open CommitFest before it closes. I'll
> go ahead and re-add it and will be pushing updates as I have them.

Okay, here is an updated and rebased patch that passes all regression
tests with and without EXEC_BACKEND. This also treats more of the
issues raised by Andres.

I still need to do the following:
- split giant patch out into separate functional commits
- add translated process descriptions
- fix some variable names here and there (notably the PgSubprocess
struct ".progname" member -- that's a misnomer.
- address some valgrind findings

I should be able to split this out into smaller commits sometime today
and will continue iterating to scratch the other items off the list.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com
From a497a8905fa7f25d41f831e83174548315fa658f Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Tue, 19 Feb 2019 15:29:33 +
Subject: [PATCH 2/2] Add a hook to allow extensions to set worker metadata

This patch implements a hook that allows extensions to modify a worker
process' metadata in backends.

Additional work done by Yuli Khodorkovskiy 
Original discussion here: https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO%3DhtC6LnA6aW4r-%2Bjq%3D3Q5RAoFQgW8EtA%40mail.gmail.com
---
 src/backend/postmaster/fork_process.c | 11 +++
 src/include/postmaster/fork_process.h |  4 
 2 files changed, 15 insertions(+)

diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index def3cee37e..e392f5207e 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,14 @@
 
 #include "postmaster/fork_process.h"
 
+/*
+ * This hook allows plugins to get control of the child process following
+ * a successful fork_process(). It can be used to perform some action in
+ * extensions to set metadata in backends (including special backends) upon
+ * setting of the session user.
+ */
+ForkProcess_post_hook_type ForkProcess_post_hook = NULL;
+
 #ifndef WIN32
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
@@ -114,6 +122,9 @@ fork_process(void)
 #ifdef USE_OPENSSL
 		RAND_cleanup();
 #endif
+
+		if (ForkProcess_post_hook)
+			(*ForkProcess_post_hook) ();
 	}
 
 	return result;
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index a42f859a08..b4d1560b72 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -20,4 +20,8 @@ extern pid_t fork_process(void);
 extern pid_t postmaster_forkexec(int argc, char *argvp[]);
 #endif
 
+/* Hook for plugins to get control after a successful fork_process() */
+typedef void (*ForkProcess_post_hook_type) ();
+extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook;
+
 #endif			/* FORK_PROCESS_H */
-- 
2.21.0

From 682aaa548249be06a6693839f1877946de8ca927 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Fri, 27 Sep 2019 12:28:19 -0400
Subject: [PATCH 1/2] Introduce subprocess infrastructure

This is an initial attempt at coalescing all of the postmaster
subprocess startups into one central framework. The patchset introduces
a MySubprocess global, which contains an PgSubprocess entry from the
process_types array. Each entry has defined entrypoints, cleanup
functions, prep functions, and a few other control-oriented variables.

This is a rework of https://commitfest.postgresql.org/25/2259/, which
attempts to address the feedback from Andres Freund.
---
 src/backend/bootstrap/bootstrap.c   | 103 +--
 src/backend/main/main.c |   1 +
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/autovacuum.c | 170 +---
 src/backend/postmaster/bgworker.c   | 157 +++-
 src/backend/postmaster/bgwriter.c   |   2 +-
 src/backend/postmaster/checkpointer.c   |   2 +-
 src/backend/postmaster/pgarch.c |  83 +-
 src/backend/postmaster/pgstat.c | 195 +
 src/backend/postmaster/postmaster.c | 890 +++-

Re: [Proposal] Global temporary tables

2020-03-02 Thread tushar


On 2/27/20 9:43 AM, 曾文旌(义从) wrote:

_-- Scenario 2:_
Here I am getting the same error message in both the below cases.
We may add a "global" keyword with GTT related error message.

postgres=# create global temporary table gtt1 (c1 int unique);
CREATE TABLE
postgres=# create temporary table tmp1 (c1 int unique);
CREATE TABLE

postgres=# create temporary table tmp2 (c1 int references gtt1(c1) );
ERROR:  constraints on temporary tables may reference only temporary 
tables


postgres=# create global temporary table gtt2 (c1 int references 
tmp1(c1) );
ERROR:  constraints on temporary tables may reference only temporary 
tables

Fixed in global_temporary_table_v15-pg13.patch



Thanks Wenjing.

This below scenario is not working  i.e even 'on_commit_delete_rows' is 
true then after commit -  rows are NOT removing


postgres=#  create global  temp table foo1(n int) with 
(on_commit_delete_rows='true');

CREATE TABLE
postgres=#
postgres=# begin;
BEGIN
postgres=*# insert into foo1 values (9);
INSERT 0 1
postgres=*# insert into foo1 values (9);
INSERT 0 1
postgres=*# select * from foo1;
 n
---
 9
 9
(2 rows)

postgres=*# commit;
COMMIT
postgres=# select * from foo1;   -- after commit -there should be 0 row 
as on_commit_delete_rows is 'true'

 n
---
 9
 9
(2 rows)

postgres=# \d+ foo1
   Table "public.foo1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats 
target | Description

+-+---+--+-+-+--+-
 n  | integer |   |  | | plain 
|  |

Access method: heap
Options: on_commit_delete_rows=true

postgres=#

but if user - create table this way then it is working as expected

postgres=#  create global  temp table foo2(n int) *on commit delete rows;*
CREATE TABLE
postgres=# begin; insert into foo2 values (9); insert into foo2 values 
(9); commit; select * from foo2;

BEGIN
INSERT 0 1
INSERT 0 1
COMMIT
 n
---
(0 rows)

postgres=#

i guess , problem is something with this syntax - create global temp 
table foo1(n int) *with (on_commit_delete_rows='true'); *


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-02 Thread David Steele


Hi Dilip,

On 2/18/20 7:30 PM, David Zhang wrote:

After manually applied the patch, a diff regenerated is attached.


David's updated patch applies but all logical decoding regression tests 
are failing on cfbot.


Do you know when you will be able to supply an updated patch?

Regards,
--
-David
da...@pgmasters.net

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-02 Thread Laurenz Albe

On Tue, 2020-01-07 at 19:05 +0100, Tomas Vondra wrote:
> This patch is currently in "needs review" state, but that seems quite
> wrong - there's been a lot of discussions about how we might improve
> behavior for append-only-tables, but IMO there's no clear consensus nor
> a patch that we might review.
> 
> So I think this should be either "waiting on author" or maybe "rejected
> with feedback". Is there any chance of getting a reviewable patch in the
> current commitfest? If not, I propose to mark it as RWF.
> 
> I still hope we can improve this somehow in time for PG13. The policy is
> not to allow new non-trivial patches in the last CF, but hopefully this
> might be considered an old patch.

I think that no conclusion was reached because there are *many* things
that could be improved, and *many* interesting and ambitious ideas were
vented.

But I think it would be good to have *something* that addresses the immediate
problem (INSERT-only tables are autovacuumed too late), as long as
that does not have negative side-effects or blocks further improvements.

I don't feel totally well with the very simplistic approach of this
patch (use the same metric to trigger autoanalyze and autovacuum),
but what about this:

- a new table storage option autovacuum_vacuum_insert_threshold,
  perhaps a GUC of the same name, by default deactivated.

- if tabentry->tuples_inserted exceeds this threshold, but not one
  of the others, lauch autovacuum with index_cleanup=off.

How would you feel about that?

Yours,
Laurenz Albe

Re: doc: vacuum full, fillfactor, and "extra space"

2020-03-02 Thread David Steele


On 1/30/20 6:54 AM, Amit Kapila wrote:

On Wed, Jan 29, 2020 at 9:10 PM Peter Eisentraut
 wrote:


On 2020-01-20 06:30, Justin Pryzby wrote:

Rebased against 40d964ec997f64227bc0ff5e058dc4a5770a70a9


I'm not sure that description of parallel vacuum in the middle of
non-full vs. full vacuum is actually that good.


I have done like that because parallel vacuum is the default.  I mean
when the user runs vacuum command, it will invoke workers to perform
index cleanup based on some conditions.


  I think those sentences
should be moved to a separate paragraph.


It seems more natural to me to add immediately after vacuum
explanation, but I might be wrong.  After the above explanation, if
you still think it is better to move into a separate paragraph, I can
do that.

Peter, do you still think this should be moved into a separate paragraph?

Regards,
--
-David
da...@pgmasters.net

Re: Psql patch to show access methods info

2020-03-02 Thread David Steele


Hi Alexander,

On 1/21/20 5:37 PM, Alvaro Herrera wrote:

On 2020-Jan-21, Alvaro Herrera wrote:


c) it would be damn handy if \dAf (maybe \dAf+) lists the datatypes that
each opfamily has opclasses for.  Maybe make the output an array, like
{int4,int8,numeric,...}  Something like [*] but somehow make it
prettier?


Sorry, I forgot to copy-edit my text here: I said "make it prettier",
but the query I submitted is already pretty enough ISTM; I had written
that comment when I only had the array_agg() version, but then I changed
it to string_agg() and that seems to have mostly done the trick.  Maybe
improve the format_type() bit to omit the quotes, if possible, but that
doesn't seem a big deal.


The last CF for PG13 has now started.  Do you know when you'll be able 
to supply a new patch to address Álvaro's review?


Regards,
--
-David
da...@pgmasters.net

Silence compiler warnings with Python 3.9

2020-03-02 Thread Peter Eisentraut

Starting with Python 3.9, the Python headers contain inline functions 
that fall afoul of our -Wdeclaration-after-statement coding style.  In 
order to silence those warnings, I've added some GCC-specific 
contortions to disable that warning for Python.h only.  Clang doesn't 
appear to warn about this at all; maybe it recognizes that this is an 
external header file.  We could also write a configure check for this if 
we want to be more flexible.


(Attempts to convince upstream to change the coding style were 
unsuccessful (https://bugs.python.org/issue39615).)


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From e63a5935b7059d70418b63bd4e4b7ccb487e2075 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 2 Mar 2020 09:11:05 +0100
Subject: [PATCH] Silence compiler warnings with Python 3.9

Starting with Python 3.9, the Python headers contain inline functions that
fall afoul of our -Wdeclaration-after-statement coding style, so add
some contortions to disable that warning for Python.h only.

Attempts to convince upstream to change the coding style were
unsuccessful (https://bugs.python.org/issue39615).
---
 src/pl/plpython/plpython.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/pl/plpython/plpython.h b/src/pl/plpython/plpython.h
index 6d981a0a06..7f2256e9e2 100644
--- a/src/pl/plpython/plpython.h
+++ b/src/pl/plpython/plpython.h
@@ -42,6 +42,21 @@
 #undef vprintf
 #undef printf
 
+/*
+ * Starting with Python 3.9, the Python headers contain inline functions that
+ * fall afoul of our -Wdeclaration-after-statement coding style, so we need to
+ * do a little dance here to disable that warning momentarily.
+ */
+
+#ifdef __GNUC__
+#define GCC_VERSION (__GNUC__ * 1 + __GNUC_MINOR__ * 100 + 
__GNUC_PATCHLEVEL__)
+#endif
+
+#if defined(__GNUC__) && GCC_VERSION >= 40600
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeclaration-after-statement"
+#endif
+
 #if defined(_MSC_VER) && defined(_DEBUG)
 /* Python uses #pragma to bring in a non-default libpython on VC++ if
  * _DEBUG is defined */
@@ -59,6 +74,10 @@
 #include 
 #endif
 
+#if defined(__GNUC__) && GCC_VERSION >= 40600
+#pragma GCC diagnostic pop
+#endif
+
 /*
  * Python 2/3 strings/unicode/bytes handling.  Python 2 has strings
  * and unicode, Python 3 has strings, which are unicode on the C
-- 
2.25.0

Re: Minor issues in .pgpass





On 2020/02/29 0:46, Hamid Akhtar wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

First of all, this seems like fixing a valid issue, albeit, the probability of 
somebody messing is low, but it is still better to fix this problem.

I've not tested the patch in any detail, however, there are a couple of 
comments I have before I proceed on with detailed testing.


Thanks for the review and comments!


1. pgindent is showing a few issues with formatting. Please have a look and 
resolve those.


Yes.


2. I think you can potentially use "len" variable instead of introducing "buflen" and 
"tmplen" variables.


Basically I don't want to use the same variable for several purposes
because which would decrease the code readability.


Also, I would choose a more appropriate name for "tmp" variable.


Yeah, so what about "rest" as the variable name?


I believe if you move the following lines before the conditional statement and simply and 
change the if statement to "if (len >= sizeof(buf) - 1)", it will serve the 
purpose.


ISTM that this doesn't work correctly when the "buf" contains
trailing carriage returns but not newlines (i.e., this line is too long
so the "buf" doesn't include newline). In this case, pg_strip_crlf()
shorten the "buf" and then its return value "len" should become
less than sizeof(buf). So the following condition always becomes
false unexpectedly in that case even though there is still rest of
the line to eat.


+   if (len >= sizeof(buf) - 1)
+   {
+   chartmp[LINELEN];


Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: Do we need to handle orphaned prepared transactions in the server?

2020-03-02 Thread Hamid Akhtar

Here is the v2 of the same patch after rebasing it and running it through
pgindent. There are no other code changes.


On Wed, Feb 19, 2020 at 8:04 PM Hamid Akhtar  wrote:

> All,
>
> Attached is version 1 of POC patch for notifying of orphaned
> prepared transactions via warnings emitted to a client
> application and/or log file. It applies to PostgreSQL branch
> "master" on top of "e2e02191" commit.
>
> I've tried to keep the patch as less invasive as I could with
> minimal impact on vacuum processes, so the performance impact
> and the changes are minimal in that area of PostgreSQL core.
>
>
> - What's in this Patch:
>
> This patch throws warnings when an autovacuum worker encounters
> an orphaned prepared transaction. It also throws warnings to a
> client when a vacuum command is issued. This patch also
> introduces two new GUCs:
>
> (1) max_age_prepared_xacts
> - The age after creation of a prepared transaction after which it
> will be considered an orphan.
>
> (2) prepared_xacts_vacuum_warn_timeout
> - The timeout period for an autovacuum (essentially any of its
> worker) to check for orphaned prepared transactions and throw
> warnings if any are found.
>
>
> - What This Patch Does:
>
> If the GUCs are enabled (set to a value higher than -1), an
> autovacuum worker running in the background checks if the
> timeout has expired. If so, it checks if there are any orphaned
> prepared transactions (i.e. their age has exceeded
> max_age_prepared_xacts). If it finds any, it throws a warning for
> every such transaction. It also emits the total number of orphaned
> prepared transactions if one or more are found.
>
> When a vacuum command is issued from within a client, say psql,
> in that case, we skip the vacuum timeout check and simply scan
> for any orphaned prepared transactions. Warnings are emitted to
> the client and log file if any are found.
>
>
> - About the New GUCs:
>
> = max_age_prepared_xacts:
> Sets maximum age after which a prepared transaction is considered an
> orphan. It applies when "prepared transactions" are enabled. The
> age for a transaction is calculated from the time it was created to
> the current time. If this value is specified without units, it is taken
> as milliseconds. The default value is -1 which allows prepared
> transactions to live forever.
>
> = prepared_xacts_vacuum_warn_timeout:
> Sets timeout after which vacuum starts throwing warnings for every
> prepared transactions that has exceeded maximum age defined by
> "max_age_prepared_xacts". If this value is specified without units,
> it is taken as milliseconds. The default value of -1 will disable
> this warning mechanism. Setting a too value could potentially fill
> up log with orphaned prepared transaction warnings, so this
> parameter must be set to a value that is reasonably large to not
> fill up log file, but small enough to notify of long running and
> potential orphaned prepared transactions. There is no additional
> timer or worker introduced with this change. Whenever a vacuum
> worker runs, it first checks for any orphaned prepared transactions.
> So at best, this GUC serves as a guideline for a vacuum worker
> if a warning should be thrown to log file or a client issuing
> vacuum command.
>
>
> - What this Patch Does Not Cover:
>
> The warning is not thrown when user either runs vacuumdb or passes
> individual relations to be vacuum. Specifically in case of vacuumdb,
> it breaks down a vacuum command to an attribute-wise vacuum command.
> So the vacuum command is indirectly run many times. Considering that
> we want to emit warnings for every manual vacuum command, this simply
> floods the terminal and log with orphaned prepared transactions
> warnings. We could potentially handle that, but the overhead of
> that seemed too much to me (and I've not invested any time looking
> to fix that either). Hence, warnings are not thrown when user runs
> vacuumdb and relation specific vacuum.
>
>
>
> On Fri, Jan 31, 2020 at 7:02 PM Hamid Akhtar 
> wrote:
>
>>
>>
>> On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer 
>> wrote:
>>
>>> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar 
>>> wrote:
>>> >
>>> > So having seen the feedback on this thread, and I tend to agree with
>>> most of what has been said here, I also agree that the server core isn't
>>> really the ideal place to handle the orphan prepared transactions.
>>> >
>>> > Ideally, these must be handled by a transaction manager, however, I do
>>> believe that we cannot let database suffer for failing of an external
>>> software, and we did a similar change through introduction of idle in
>>> transaction timeout behavior.
>>>
>>> The difference, IMO, is that idle-in-transaction aborts don't affect
>>> anything we've promised to be durable.
>>>
>>> Once you PREPARE TRANSACTION the DB has made a promise that that txn
>>> is durable. We don't have any consistent feedback channel to back to
>>> applications and say "Hey, if you're not going to finish this up we
>>> need to get

Re: Change atoi to strtol in same place

2020-03-02 Thread Daniel Gustafsson

> On 11 Feb 2020, at 17:54, Alvaro Herrera  wrote:
> 
> On 2019-Dec-06, Joe Nelson wrote:
> 
>> I see this patch has been moved to the next commitfest. What's the next
>> step, does it need another review?
> 
> This patch doesn't currently apply; it has conflicts with at least
> 01368e5d9da7 and 7e735035f208; even in 7e735035f208^ it applies with
> fuzz.  Please post an updated version so that it can move forward.

Ping.  With the 2020-03 CommitFest now under way, are you able to supply a
rebased patch for consideration?

cheers ./daniel

Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.

2020-03-02 Thread Kohei KaiGai

Fujita-san,

> Unfortunately, I didn't have time to work on that (and won't in the
> development cycle for PG13.)
>
> > Indeed, it is not an ideal query plan to execute for each updated rows...
> >
> > postgres=# explain select * from rtable_parent where tableoid = 126397
> > and ctid = '(0,11)'::tid;
> >QUERY PLAN
> > -
> >  Append  (cost=0.00..5.18 rows=2 width=50)
> >->  Seq Scan on rtable_parent  (cost=0.00..1.15 rows=1 width=31)
> >  Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid))
> >->  Tid Scan on rtable_child  (cost=0.00..4.02 rows=1 width=68)
> >  TID Cond: (ctid = '(0,11)'::tid)
> >  Filter: (tableoid = '126397'::oid)
> > (6 rows)
>
> IIRC, I think one of Tom's concerns about the solution I proposed was
> that it added the tableoid restriction clause to the remote
> UPDATE/DELETE query even if the remote table is not an inheritance
> set.  To add the clause only if the remote table is an inheritance
> set, what I have in mind is to 1) introduce a new postgres_fdw table
> option to indicate whether the remote table is an inheritance set or
> not, and 2) determine whether to add the clause or not, using the
> option.
>
I don't think the new options in postgres_fdw is a good solution because
remote table structure is flexible regardless of the local configuration in
foreign-table options. People may add inherited child tables after the
declaration of foreign-tables. It can make configuration mismatch.
Even if we always add tableoid=OID restriction on the remote query,
it shall be evaluated after the TidScan fetched the row pointed by ctid.
Its additional cost is limited.

And, one potential benefit is tableoid=OID restriction can be used to prune
unrelated partition leafs/inherited children at the planner stage.
Probably, it is a separated groundwork from postgres_fdw.
One planner considers the built-in rule for this kind of optimization,
enhancement at postgres_fdw will be quite simple, I guess.

How about your thought?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-02 Thread Julien Rouhaud

On Mon, Mar 2, 2020 at 1:01 PM legrand legrand
 wrote:
>
> Julien Rouhaud wrote
> > On Sun, Mar 1, 2020 at 3:55 PM legrand legrand
> > 
>
> > legrand_legrand@
>
> >  wrote:
> >>
> >> >> I like the idea of adding a check for a non-zero queryId in the new
> >> >> pgss_planner_hook() (zero queryid shouldn't be reserved for
> >> >> utility_statements ?).
> >>
> >> > Some assert hit later, I can say that it's not always true.  For
> >> > instance a CREATE TABLE AS won't run parse analysis for the underlying
> >> > query, as this has already been done for the original statement, but
> >> > will still call the planner.  I'll change pgss_planner_hook to ignore
> >> > such cases, as pgss_store would otherwise think that it's a utility
> >> > statement.  That'll probably incidentally fix the IVM incompatibility.
> >>
> >> Today with or without test on parse->queryId != UINT64CONST(0),
> >> CTAS is collected as a utility_statement without planning counter.
> >> This seems to me respectig the rule, not sure that this needs any
> >> new (risky) change to the actual (quite stable) patch.
> >
> > But the queryid ends up not being computed the same way:
> >
> > # select queryid, query, plans, calls from pg_stat_statements where
> > query like 'create table%';
> >queryid   | query  | plans | calls
> > -++---+---
> >  8275950546884151007 | create table test as select 1; | 1 | 0
> >  7546197440584636081 | create table test as select 1  | 0 | 1
> > (2 rows)
> >
> > That's because CreateTableAsStmt->query doesn't have a query
> > location/len, as transformTopLevelStmt is only setting that for the
> > top level Query.  That's probably an oversight in ab1f0c82257, but I'm
> > not sure what's the best way to fix that.  Should we pass that
> > information to all transformXXX function, or let transformTopLevelStmt
> > handle that.
>
>
> arf, this was not the case in my testing env (that is not up to date) :o(
> and would not have appeared at all with the proposed test on
> parse->queryId != UINT64CONST(0) ...

I'm not sure what was the exact behavior you had, but that shouldn't
have changed since previous version.  The underlying query isn't a top
level statement, so maybe you didn't set pg_stat_statements.track =
'all'?

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-02 Thread legrand legrand

Julien Rouhaud wrote
> On Sun, Mar 1, 2020 at 3:55 PM legrand legrand
> 

> legrand_legrand@

>  wrote:
>>
>> >> I like the idea of adding a check for a non-zero queryId in the new
>> >> pgss_planner_hook() (zero queryid shouldn't be reserved for
>> >> utility_statements ?).
>>
>> > Some assert hit later, I can say that it's not always true.  For
>> > instance a CREATE TABLE AS won't run parse analysis for the underlying
>> > query, as this has already been done for the original statement, but
>> > will still call the planner.  I'll change pgss_planner_hook to ignore
>> > such cases, as pgss_store would otherwise think that it's a utility
>> > statement.  That'll probably incidentally fix the IVM incompatibility.
>>
>> Today with or without test on parse->queryId != UINT64CONST(0),
>> CTAS is collected as a utility_statement without planning counter.
>> This seems to me respectig the rule, not sure that this needs any
>> new (risky) change to the actual (quite stable) patch.
> 
> But the queryid ends up not being computed the same way:
> 
> # select queryid, query, plans, calls from pg_stat_statements where
> query like 'create table%';
>queryid   | query  | plans | calls
> -++---+---
>  8275950546884151007 | create table test as select 1; | 1 | 0
>  7546197440584636081 | create table test as select 1  | 0 | 1
> (2 rows)
> 
> That's because CreateTableAsStmt->query doesn't have a query
> location/len, as transformTopLevelStmt is only setting that for the
> top level Query.  That's probably an oversight in ab1f0c82257, but I'm
> not sure what's the best way to fix that.  Should we pass that
> information to all transformXXX function, or let transformTopLevelStmt
> handle that.


arf, this was not the case in my testing env (that is not up to date) :o(
and would not have appeared at all with the proposed test on 
parse->queryId != UINT64CONST(0) ...




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: color by default

The patch to improve color support on Windows has been commited [1], and I
would like to share some of the discussion there that might affect this
patch.

- The documentation/comments could make a better job of explaining the case
of PG_COLOR equals 'always', explicitly saying that no checks are done
about the output channel.

Aside from the decision about what the default coloring behaviour should
be, there are parts of this patch that could be applied independently, as
an improvement on the current state.

- The new function terminal_supports_color() should also apply when
PG_COLOR is 'auto', to minimize the chances of seeing escape characters in
the user terminal.

- The new entry in the documentation, specially as the PG_COLORS parameter
seems to be currently undocumented. The programs that can use PG_COLOR
would benefit from getting a link to it.

[1]
https://www.postgresql.org/message-id/20200302064842.GE32059%40paquier.xyz

Regards,

Juan José Santamaría Flecha

Re: Crash by targetted recovery





On 2020/02/28 12:13, Kyotaro Horiguchi wrote:

At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao  
wrote in



On 2020/02/27 17:05, Kyotaro Horiguchi wrote:

Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
 wrote in

On 2020/02/27 15:23, Kyotaro Horiguchi wrote:

I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?

It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)


Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?

I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.


If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

  if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
   currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.


The approach using randAccess seems unsafe. Please imagine
the case where currentSource is changed to XLOG_FROM_ARCHIVE
because randAccess is true, while walreceiver is still running.
For example, this case can occur when the record at REDO
starting point is fetched with randAccess = true after walreceiver
is invoked to fetch the last checkpoint record. The situation
"currentSource != XLOG_FROM_STREAM while walreceiver is
  running" seems invalid. No?


When I mentioned an possibility of changing ReadRecord so that it
modifies randAccess instead of currentSource, I thought that
WaitForWALToBecomeAvailable should shutdown wal receiver as
needed.

At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi 
 wrote in
me> location, breaks a stream.  (But the patch doesn't try to stop wal
me> sender if randAccess.)


Sorry, I failed to notice that.


And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.


It works correctly. This is why WaitForWALToBecomeAvailable() uses
fetching_ckpt argument.


If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.


But this can lead to unnecessary restart of walreceiver. Since
fetching_ckpt ensures that the WAL file containing the REDO
starting record exists in pg_wal, there is no need to reconnect
to the primary when reading the REDO starting record.

Is there other case where we need to go back to XLOG_FROM_ARCHIVE
by random access?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: logical replication empty transactions

2020-03-02 Thread Amit Kapila

On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar  wrote:
>
> On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira  wrote:
> >
> > Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
> >  escreveu:
> > >
> > > After setting up logical replication of a slowly changing table using the 
> > > built in pub/sub facility, I noticed way more network traffic than made 
> > > sense.  Looking into I see that every transaction in that database on the 
> > > master gets sent to the replica.  99.999+% of them are empty transactions 
> > > ('B' message and 'C' message with nothing in between) because the 
> > > transactions don't touch any tables in the publication, only 
> > > non-replicated tables.  Is doing it this way necessary for some reason?  
> > > Couldn't we hold the transmission of 'B' until something else comes 
> > > along, and then if that next thing is 'C' drop both of them?
> > >
> > That is not optimal. Those empty transactions is a waste of bandwidth.
> > We can suppress them if no changes will be sent. test_decoding
> > implements "skip empty transaction" as you described above and I did
> > something similar to it. Patch is attached.
>
> I think this significantly reduces the network bandwidth for empty
> transactions.  I have briefly reviewed the patch and it looks good to
> me.
>

One thing that is not clear to me is how will we advance restart_lsn
if we don't send any empty xact in a system where there are many such
xacts?  IIRC, the restart_lsn is advanced based on confirmed_flush lsn
sent by subscriber.  After this change, the subscriber won't be able
to send the confirmed_flush and for a long time, we won't be able to
advance restart_lsn.  Is that correct, if so, why do we think that is
acceptable?  One might argue that restart_lsn will be advanced as soon
as we send the first non-empty xact, but not sure if that is good
enough.  What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] Add schema and table names to partition error

2020-03-02 Thread Amit Kapila

On Mon, Mar 2, 2020 at 9:39 AM Amit Langote  wrote:
>
>
> > My (very limited) understanding is that partition
> > "constraints" are entirely contained within pg_class.relpartbound of the
> > partition.
>
> That is correct.
>
> > Are you suggesting that the table name go into the constraint name field
> > of the error?
>
> Yes, that's what I was thinking, at least for "partition constraint
> violated" errors, but given the above that would be a misleading use
> of ErrorData.constraint_name.
>

I think it is better to use errtable in such cases.

> Maybe it's not too late to invent a new error code like
> ERRCODE_PARTITION_VIOLATION or such, then maybe we can use a
> hard-coded name, say, just the string "partition constraint".
>
> > >> There are couple more instances in src/backend/command/tablecmds.c
> > >> where partition constraint is checked:
> > >>
> > >> Maybe, better fix these too for completeness.
> >
> > Done. As there is no named constraint here, I used errtable again.
> >
> > > Right, if we want to make a change for this, then I think we can once
> > > check all the places where we use error code ERRCODE_CHECK_VIOLATION.
> >
> > I've looked at every instance of this. It is used for 1) check
> > constraints, 2) domain constraints, and 3) partition constraints.
> >
> > 1. errtableconstraint is used with the name of the constraint.
> > 2. errdomainconstraint is used with the name of the constraint except in
> > one instance which deliberately uses errtablecol.
> > 3. With the attached patch, errtable is used except for one instance in
> > src/backend/partitioning/partbounds.c described below.
> >
> > In check_default_partition_contents of
> > src/backend/partitioning/partbounds.c, the default partition is checked
> > for any rows that should belong in the partition being added _unless_
> > the leaf being checked is a foreign table. There are two tables
> > mentioned in this warning, and I couldn't decide which, if any, deserves
> > to be in the error fields:
> >
> >  /*
> >   * Only RELKIND_RELATION relations (i.e. leaf
> > partitions) need to be
> >   * scanned.
> >   */
> >  if (part_rel->rd_rel->relkind != RELKIND_RELATION)
> >  {
> >  if (part_rel->rd_rel->relkind ==
> > RELKIND_FOREIGN_TABLE)
> >  ereport(WARNING,
> >
> > (errcode(ERRCODE_CHECK_VIOLATION),
> >   errmsg("skipped
> > scanning foreign table \"%s\" which is a partition of default partition
> > \"%s\"",
> >
> > RelationGetRelationName(part_rel),
> >
> > RelationGetRelationName(default_rel;
>
> It seems strange to see that errcode here or any errcode for that
> matter, so we shouldn't really be concerned about this one.
>

Right.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-02 Thread Sergei Kornilov

Hello

I reviewed a recently published patch. Looks good for me.
One small note: the values for the new definitions in progress.h seems not to 
be aligned vertically. However, pgindent doesn't objects.

regards, Sergei

Re: Cleanup - Removal of apply_typmod function in #if 0

2020-03-02 Thread vignesh C

On Mon, Mar 2, 2020 at 1:27 PM Peter Eisentraut
 wrote:
>
> On 2019-10-26 10:36, vignesh C wrote:
> > One of the function apply_typmod in numeric.c file present within #if 0.
> > This is like this for many years.
> > I felt this can be removed.
> > Attached patch contains the changes to handle removal of apply_typmod
> > present in #if 0.
>
> committed
>

Thanks Peter for committing.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Allow to_date() and to_timestamp() to accept localized names

On Sun, Mar 1, 2020 at 11:25 PM Tom Lane  wrote:

>
> > Barring objections, this seems
> > committable to me.
>
> Going once, going twice ...
>

You have moved this to better place, so none from me, and thank you.

Regards,

Juan José Santamaría Flecha

Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform

On Mon, Mar 2, 2020 at 7:48 AM Michael Paquier  wrote:

>
> On top of that, and that's a separate issue, I have noticed that we
> have exactly zero documentation about PG_COLORS (the plural flavor,
> not the singular), but we have code for it in common/logging.c..
>

 Yeah, there is nothing about it prior to [1]. So, this conversation will
have to be carried over there.


> Anyway, committed down to 12, after tweaking a few things.
>

Thank you.

[1]
https://www.postgresql.org/message-id/bbdcce43-bd2e-5599-641b-9b44b9e0a...@2ndquadrant.com

Regards,

Juan José Santamaría Flecha

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-02 Thread Julien Rouhaud

On Sun, Mar 1, 2020 at 3:55 PM legrand legrand
 wrote:
>
> >> I like the idea of adding a check for a non-zero queryId in the new
> >> pgss_planner_hook() (zero queryid shouldn't be reserved for
> >> utility_statements ?).
>
> > Some assert hit later, I can say that it's not always true.  For
> > instance a CREATE TABLE AS won't run parse analysis for the underlying
> > query, as this has already been done for the original statement, but
> > will still call the planner.  I'll change pgss_planner_hook to ignore
> > such cases, as pgss_store would otherwise think that it's a utility
> > statement.  That'll probably incidentally fix the IVM incompatibility.
>
> Today with or without test on parse->queryId != UINT64CONST(0),
> CTAS is collected as a utility_statement without planning counter.
> This seems to me respectig the rule, not sure that this needs any
> new (risky) change to the actual (quite stable) patch.

But the queryid ends up not being computed the same way:

# select queryid, query, plans, calls from pg_stat_statements where
query like 'create table%';
   queryid   | query  | plans | calls
-++---+---
 8275950546884151007 | create table test as select 1; | 1 | 0
 7546197440584636081 | create table test as select 1  | 0 | 1
(2 rows)

That's because CreateTableAsStmt->query doesn't have a query
location/len, as transformTopLevelStmt is only setting that for the
top level Query.  That's probably an oversight in ab1f0c82257, but I'm
not sure what's the best way to fix that.  Should we pass that
information to all transformXXX function, or let transformTopLevelStmt
handle that.

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side