Re: Parallel copy
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma wrote: > > On Tue, 25 Feb 2020 at 18:00, Tomas Vondra wrote: > > Perhaps. I guess it'll depend on the CSV file (number of fields, ...), > > so I still think we need to do some measurements first. I'm willing to > > do that, but (a) I doubt I'll have time for that until after 2020-03, > > and (b) it'd be good to agree on some set of typical CSV files. > > I agree that getting a nice varied dataset would be nice. Including > things like narrow integer only tables, strings with newlines and > escapes in them, extremely wide rows. > > I tried to capture a quick profile just to see what it looks like. > Grabbed a random open data set from the web, about 800MB of narrow > rows CSV [1]. > > Script: > CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count text); > COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true); > > Profile: > # Samples: 59K of event 'cycles:u' > # Event count (approx.): 57644269486 > # > # Overhead Command Shared Object Symbol > # .. > ... > # > 18.24% postgres postgres[.] CopyReadLine > 9.23% postgres postgres[.] NextCopyFrom > 8.87% postgres postgres[.] NextCopyFromRawFields > 5.82% postgres postgres[.] pg_verify_mbstr_len > 5.45% postgres postgres[.] pg_strtoint32 > 4.16% postgres postgres[.] heap_fill_tuple > 4.03% postgres postgres[.] heap_compute_data_size > 3.83% postgres postgres[.] CopyFrom > 3.78% postgres postgres[.] AllocSetAlloc > 3.53% postgres postgres[.] heap_form_tuple > 2.96% postgres postgres[.] InputFunctionCall > 2.89% postgres libc-2.30.so[.] __memmove_avx_unaligned_erms > 1.82% postgres libc-2.30.so[.] __strlen_avx2 > 1.72% postgres postgres[.] AllocSetReset > 1.72% postgres postgres[.] RelationPutHeapTuple > 1.47% postgres postgres[.] heap_prepare_insert > 1.31% postgres postgres[.] heap_multi_insert > 1.25% postgres postgres[.] textin > 1.24% postgres postgres[.] int4in > 1.05% postgres postgres[.] tts_buffer_heap_clear > 0.85% postgres postgres[.] pg_any_to_server > 0.80% postgres postgres[.] pg_comp_crc32c_sse42 > 0.77% postgres postgres[.] cstring_to_text_with_len > 0.69% postgres postgres[.] AllocSetFree > 0.60% postgres postgres[.] appendBinaryStringInfo > 0.55% postgres postgres[.] tts_buffer_heap_materialize.part.0 > 0.54% postgres postgres[.] palloc > 0.54% postgres libc-2.30.so[.] __memmove_avx_unaligned > 0.51% postgres postgres[.] palloc0 > 0.51% postgres postgres[.] pg_encoding_max_length > 0.48% postgres postgres[.] enlargeStringInfo > 0.47% postgres postgres[.] ExecStoreVirtualTuple > 0.45% postgres postgres[.] PageAddItemExtended > > So that confirms that the parsing is a huge chunk of overhead with > current splitting into lines being the largest portion. Amdahl's law > says that splitting into tuples needs to be made fast before > parallelizing makes any sense. > I had taken perf report with the same test data that you had used, I was getting the following results: . + 99.61% 0.00% postgres postgres[.] PortalRun + 99.61% 0.00% postgres postgres[.] PortalRunMulti + 99.61% 0.00% postgres postgres[.] PortalRunUtility + 99.61% 0.00% postgres postgres[.] ProcessUtility + 99.61% 0.00% postgres postgres[.] standard_ProcessUtility + 99.61% 0.00% postgres postgres[.] DoCopy + 99.30% 0.94% postgres postgres[.] CopyFrom + 51.61% 7.76% postgres postgres[.] NextCopyFrom + 23.66% 0.01% postgres postgres[.] CopyMultiInsertInfoFlush + 23.61% 0.28% postgres postgres[.] CopyMultiInsertBufferFlush + 21.99% 1.02% postgres postgres[.] NextCopyFromRawFields *+ 19.79% 0.01% postgres postgres[.] table_multi_insert+ 19.32% 3.00% postgres postgres[.] heap_multi_insert*+ 18.27% 2.44% postgres postgres[.] InputFunctionCall *+ 15.19% 0.89% postgres postgres[.] CopyReadLine*+ 13.05% 0.18% postgres postgres[.] ExecMaterializeSlot + 13.00% 0.55% postgres postgres[.] tts_buffer_heap_materialize + 12.31% 1.77% postgres postgres[.] heap_form_tuple + 10.43%
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On 2020/03/03 14:37, Shinoda, Noriyoshi (PN Japan A Delivery) wrote: Hi, Thank you for developing good features. The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the backup_streamed column. Thanks for the report and patch! Pushed. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Parallel copy
On Wed, Feb 26, 2020 at 4:24 PM Amit Kapila wrote: > > On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra > wrote: > > > > On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote: > > >Hi, > > > > > >> The one piece of information I'm missing here is at least a very rough > > >> quantification of the individual steps of CSV processing - for example > > >> if parsing takes only 10% of the time, it's pretty pointless to start by > > >> parallelising this part and we should focus on the rest. If it's 50% it > > >> might be a different story. Has anyone done any measurements? > > > > > >Not recently, but I'm pretty sure that I've observed CSV parsing to be > > >way more than 10%. > > > > > > > Perhaps. I guess it'll depend on the CSV file (number of fields, ...), > > so I still think we need to do some measurements first. > > > > Agreed. > > > I'm willing to > > do that, but (a) I doubt I'll have time for that until after 2020-03, > > and (b) it'd be good to agree on some set of typical CSV files. > > > > Right, I don't know what is the best way to define that. I can think > of the below tests. > > 1. A table with 10 columns (with datatypes as integers, date, text). > It has one index (unique/primary). Load with 1 million rows (basically > the data should be probably 5-10 GB). > 2. A table with 10 columns (with datatypes as integers, date, text). > It has three indexes, one index can be (unique/primary). Load with 1 > million rows (basically the data should be probably 5-10 GB). > 3. A table with 10 columns (with datatypes as integers, date, text). > It has three indexes, one index can be (unique/primary). It has before > and after trigeers. Load with 1 million rows (basically the data > should be probably 5-10 GB). > 4. A table with 10 columns (with datatypes as integers, date, text). > It has five or six indexes, one index can be (unique/primary). Load > with 1 million rows (basically the data should be probably 5-10 GB). > I have tried to capture the execution time taken for 3 scenarios which I felt could give a fair idea: Test1 (Table with 3 indexes and 1 trigger) Test2 (Table with 2 indexes) Test3 (Table without indexes/triggers) I have captured the following details: File Read time - time taken to read the file from CopyGetData function. Read line Time - time taken to read line from NextCopyFrom function(read time & tokenise time excluded) Tokenize Time - time taken to tokenize the contents from NextCopyFromRawFields function. Data Execution Time - remaining execution time from the total time The execution breakdown for the tests are given below: Test/ Time(In Seconds) Total Time File Read Time Read line /Buffer Read Time Tokenize Time Data Execution Time Test1 1693.369 0.256 34.173 5.578 1653.362 Test2 736.096 0.288 39.762 6.525 689.521 Test3 112.06 0.266 39.189 6.433 66.172 Steps for the scenarios: Test1(Table with 3 indexes and 1 trigger): CREATE TABLE census2 (year int,age int,ethnic int,sex int,area text,count text); CREATE TABLE census3(year int,age int,ethnic int,sex int,area text,count text); CREATE INDEX idx1_census2 on census2(year); CREATE INDEX idx2_census2 on census2(age); CREATE INDEX idx2_census2 on census2(ethnic); CREATE or REPLACE FUNCTION census2_afterinsert() RETURNS TRIGGER AS $$ BEGIN INSERT INTO census3 SELECT * FROM census2 limit 1; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER census2_trigger AFTER INSERT ON census2 FOR EACH ROW EXECUTE PROCEDURE census2_afterinsert(); COPY census2 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true); Test2 (Table with 2 indexes): CREATE TABLE census1 (year int,age int,ethnic int,sex int,area text,count text); CREATE INDEX idx1_census1 on census1(year); CREATE INDEX idx2_census1 on census1(age); COPY census1 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true); Test3 (Table without indexes/triggers): CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count text); COPY census FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true); Note: The Data8277.csv used was the same data that Ants aasma had used. >From the above result we could infer that Read line will have to be done sequentially. Read line time takes about 2.01%, 5.40% and 34.97%of the total time. I felt we will be able to parallelise the remaining phases of the copy. The performance improvement will vary based on the scenario(indexes/triggers), it will be proportionate to the number of indexes and triggers. Read line can also be parallelised in txt format(non csv). I feel parallelising copy could give significant improvement in quite some scenarios. Further I'm planning to see how the execution will be for toast table. I'm also planning to do test on RAM disk where I will configure the data on RAM disk, so that we can further eliminate the I/O cost. Thoughts? Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
RE: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Hi, Thank you for developing good features. The attached patch is a small fix to the committed documentation. This patch fixes the description literal for the backup_streamed column. Regards, Noriyoshi Shinoda -Original Message- From: Fujii Masao [mailto:masao.fu...@oss.nttdata.com] Sent: Tuesday, March 3, 2020 12:09 PM To: Kyotaro Horiguchi Cc: amitlangot...@gmail.com; masahiko.saw...@2ndquadrant.com; pgsql-hack...@postgresql.org Subject: Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side On 2020/03/03 9:27, Kyotaro Horiguchi wrote: > At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao > wrote in >>> Attached is the updated version of the patch. >>> The previous patch used only pgstat_progress_update_param() even >>> when updating multiple values. Since those updates are not atomic, >>> this can cause readers of the values to see the intermediate states. >>> To avoid this issue, the latest patch uses >>> pgstat_progress_update_multi_param(), instead. >> >> Attached the updated version of the patch. >> Barring any objections, I plan to commit this patch. > > It is working as designed and the status names are fine to me. Thanks for the review! I pushed the patch. > The last one comment from me. > The newly defined symbols have inconsistent indents. Yes, I fixed that. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters monitoring_sgml.diff Description: monitoring_sgml.diff
Re: color by default
On Mon, Mar 02, 2020 at 01:00:44PM +0100, Juan José Santamaría Flecha wrote: > - The new entry in the documentation, specially as the PG_COLORS parameter > seems to be currently undocumented. The programs that can use PG_COLOR > would benefit from getting a link to it. The actual problem here is that we don't have an actual centralized place where we could put that stuff. And anything able to use this option is basically anything using src/common/logging.c. Regarding PG_COLORS, the commit message of cc8d415 mentions it, but we have no actual example of how to use it, and the original thread has zero reference to it: https://www.postgresql.org/message-id/6a609b43-4f57-7348-6480-bd022f924...@2ndquadrant.com And in fact, it took me a while to figure out that using it is a mix of three keywords ("error", "warning" or "locus") separated by colons which need to have an equal sign to the color defined. Here is for example how to make the locus show up in yellow with errors in blue: export PG_COLORS='error=01;34:locus=01;33' Having to dig into the code to find out that stuff is not a good user experience. And I found out about that only because I worked on a patch touching this area yesterday. -- Michael signature.asc Description: PGP signature
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Wed, Feb 19, 2020 at 6:00 AM David Zhang wrote: > > After manually applied the patch, a diff regenerated is attached. > > On 2020-02-18 4:16 p.m., David Zhang wrote: > > 1. Tried to apply the patch to PG 12.2 commit > > 45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), > > it doesn't work. Then tried to check the patch, and found the errors > > showing below. > > $ git apply --check > > 0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch > > error: patch failed: contrib/test_decoding/logical.conf:1 > > error: contrib/test_decoding/logical.conf: patch does not apply > > error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133 > > error: src/backend/replication/logical/reorderbuffer.c: patch does not apply > > > > 2. Ran a further check for file "logical.conf", and found there is only one > > commit since 2014, which doesn't have the parameter, > > "logical_decoding_work_mem = 64kB" > > > > 3. Manually apply the patch including > > src/backend/replication/logical/reorderbuffer.c, and then ran a simple > > logical replication test. A connection issue is found like below, > > "table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 > > abalance[integer]:0 filler[character]:' > >' > > pg_recvlogical: error: could not receive data from WAL stream: server > > closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > pg_recvlogical: disconnected; waiting 5 seconds to try again" > > > > 4. This connection issue can be reproduced on PG 12.2 commit mentioned > > above, the basic steps, > > 4.1 Change "wal_level = logical" in "postgresql.conf" > > 4.2 create a logical slot and listen on it, > > $ pg_recvlogical -d postgres --slot test --create-slot > > $ pg_recvlogical -d postgres --slot test --start -f - > > > > 4.3 from another terminal, run the command below, > > $ pgbench -i -p 5432 -d postgres > > > > Let me know if I did something wrong, and if a new patch is available, I > > can re-run the test on the same environment. Thanks for testing and rebasing. I think one of the hunks is missing in your rebased version. That could be the reason for failure. Can you test on my latest version? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Tue, Mar 3, 2020 at 8:42 AM Dilip Kumar wrote: > > On Mon, Mar 2, 2020 at 7:27 PM David Steele wrote: > > > > Hi Dilip, > > > > On 2/18/20 7:30 PM, David Zhang wrote: > > > After manually applied the patch, a diff regenerated is attached. > > > > David's updated patch applies but all logical decoding regression tests > > are failing on cfbot. > > > > Do you know when you will be able to supply an updated patch? > > I will try to send in a day or two. I have rebased the patch. check-world is passing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v2-0001-Fastpath-for-sending-changes-to-output-plugin-in-.patch Description: Binary data
Re: [PATCH] Add schema and table names to partition error
On 3/1/20 10:09 PM, Amit Langote wrote: Hi Chris, On Mon, Mar 2, 2020 at 8:51 AM Chris Bandy wrote: On 3/1/20 5:14 AM, Amit Kapila wrote: On Sun, Mar 1, 2020 at 10:10 AM Amit Langote wrote: There are couple more instances in src/backend/command/tablecmds.c where partition constraint is checked: Maybe, better fix these too for completeness. Another thing we might need to see is which of these can be back-patched. We should also try to write the tests for cases we are changing even if we don't want to commit those. I don't have any opinion on back-patching. Existing tests pass. I wasn't able to find another test that checks the constraint field of errors. There's a little bit in the tests for psql, but that is about the the \errverbose functionality rather than specific errors and their fields. Actually, it's not a bad idea to use \errverbose to test this. I've added a second patch with tests that cover three of the five errors touched by the first patch. Rather than \errverbose, I simply \set VERBOSITY verbose. I could not find a way to exclude the location field from the output, so those lines will be likely be out of date soon--if not already. I couldn't find a way to exercise the errors in tablecmds.c. Does anyone know how to instigate a table rewrite that would violate partition constraints? I tried: ALTER TABLE pterr1 ALTER y TYPE bigint USING (y - 5); ERROR: 42P16: cannot alter column "y" because it is part of the partition key of relation "pterr1" LOCATION: ATPrepAlterColumnType, tablecmds.c:10812 Thanks, Chris >From c6b39accf9f51f9c08a2fc62e848144776e23ffb Mon Sep 17 00:00:00 2001 From: Chris Bandy Date: Sat, 29 Feb 2020 12:47:56 -0600 Subject: [PATCH] Add schema and table names to partition errors --- src/backend/commands/tablecmds.c | 6 -- src/backend/executor/execMain.c | 3 ++- src/backend/executor/execPartition.c | 3 ++- src/backend/partitioning/partbounds.c | 3 ++- 4 files changed, 10 insertions(+), 5 deletions(-) diff --git src/backend/commands/tablecmds.c src/backend/commands/tablecmds.c index 02a7c04fdb..6a32812a1f 100644 --- src/backend/commands/tablecmds.c +++ src/backend/commands/tablecmds.c @@ -5342,12 +5342,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("updated partition constraint for default partition \"%s\" would be violated by some row", - RelationGetRelationName(oldrel; + RelationGetRelationName(oldrel)), + errtable(oldrel))); else ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("partition constraint of relation \"%s\" is violated by some row", - RelationGetRelationName(oldrel; + RelationGetRelationName(oldrel)), + errtable(oldrel))); } /* Write the tuple out to the new relation */ diff --git src/backend/executor/execMain.c src/backend/executor/execMain.c index ee5c3a60ff..7cb486b211 100644 --- src/backend/executor/execMain.c +++ src/backend/executor/execMain.c @@ -1878,7 +1878,8 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("new row for relation \"%s\" violates partition constraint", RelationGetRelationName(resultRelInfo->ri_RelationDesc)), - val_desc ? errdetail("Failing row contains %s.", val_desc) : 0)); + val_desc ? errdetail("Failing row contains %s.", val_desc) : 0, + errtable(resultRelInfo->ri_RelationDesc))); } /* diff --git src/backend/executor/execPartition.c src/backend/executor/execPartition.c index c13b1d3501..a5542b92c7 100644 --- src/backend/executor/execPartition.c +++ src/backend/executor/execPartition.c @@ -345,7 +345,8 @@ ExecFindPartition(ModifyTableState *mtstate, RelationGetRelationName(rel)), val_desc ? errdetail("Partition key of the failing row contains %s.", - val_desc) : 0)); + val_desc) : 0, + errtable(rel))); } if (partdesc->is_leaf[partidx]) diff --git src/backend/partitioning/partbounds.c src/backend/partitioning/partbounds.c index 35953f23fa..4c47f54a57 100644 --- src/backend/partitioning/partbounds.c +++ src/backend/partitioning/partbounds.c @@ -1366,7 +1366,8 @@ check_default_partition_contents(Relation parent, Relation default_rel, ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("updated partition constraint for default partition \"%s\" would be violated by some row", -RelationGetRelationName(default_rel; +RelationGetRelationName(default_rel)), + errtable(default_rel))); ResetExprContext(econtext); CHECK_FOR_INTERRUPTS(); -- 2.11.0 >From 2d2ab39bcbc7ea13cd04986b0c8aad6f14449ebd Mon Sep 17 00:00:00 2001 From: Chris Bandy Date: Mon, 2 Mar 2020 22:13:28 -0600 Subject: [PATCH] Tests for partition error fields --- src/test/regress/expected/partition_errors.out | 37
Re: logical replication empty transactions
On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila wrote: > > On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar wrote: > > > > On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira wrote: > > > > > > Em seg., 21 de out. de 2019 às 21:20, Jeff Janes > > > escreveu: > > > > > > > > After setting up logical replication of a slowly changing table using > > > > the built in pub/sub facility, I noticed way more network traffic than > > > > made sense. Looking into I see that every transaction in that database > > > > on the master gets sent to the replica. 99.999+% of them are empty > > > > transactions ('B' message and 'C' message with nothing in between) > > > > because the transactions don't touch any tables in the publication, > > > > only non-replicated tables. Is doing it this way necessary for some > > > > reason? Couldn't we hold the transmission of 'B' until something else > > > > comes along, and then if that next thing is 'C' drop both of them? > > > > > > > That is not optimal. Those empty transactions is a waste of bandwidth. > > > We can suppress them if no changes will be sent. test_decoding > > > implements "skip empty transaction" as you described above and I did > > > something similar to it. Patch is attached. > > > > I think this significantly reduces the network bandwidth for empty > > transactions. I have briefly reviewed the patch and it looks good to > > me. > > > > One thing that is not clear to me is how will we advance restart_lsn > if we don't send any empty xact in a system where there are many such > xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn > sent by subscriber. After this change, the subscriber won't be able > to send the confirmed_flush and for a long time, we won't be able to > advance restart_lsn. Is that correct, if so, why do we think that is > acceptable? One might argue that restart_lsn will be advanced as soon > as we send the first non-empty xact, but not sure if that is good > enough. What do you think? It seems like a valid point. One idea could be that we can track the last commit LSN which we streamed and if the confirmed flush location is already greater than that then even if we skip the sending the commit message we can increase the confirm flush location locally. Logically, it should not cause any problem because once we have got the confirmation for whatever we have streamed so far. So for other commits(which we are skipping), we can we advance it locally because we are sure that we don't have any streamed commit which is not yet confirmed by the subscriber. This is just my thought, but if we think from the code and design perspective then it might complicate the things and sounds hackish. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: First WAL segment file that initdb creates
On 2020/02/19 5:26, David Zhang wrote: The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:tested, passed The issue has been verified using below steps: 1. $ initdb -D /home/test/PG122DATA/data 2. $ ls -l /home/test/PG122DATA/data/pg_wal/ total 16388 -rw--- 1 test test 16777216 Feb 18 12:07 00010001 drwx-- 2 test test 4096 Feb 18 12:07 archive_status The first WAL segment file created by initdb is "00010001", not "0001". Thanks for the test! I pushed the patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: [PATCH] Documentation bug related to client authentication using TLS certificate
Hi, Cary. On 3/2/20 1:06 PM, Cary Huang wrote: Hi I found a document bug about client authentication using TLS certificate. When clientcert authentication is enabled in pg_hba.conf, libpq does not verify that the *common name*in certificate matches*database username*like it is described in the documentation before allowing client connection. Instead, when sslmode is set to “verify-full”, libpq will verify if the *server host name*matches the *common name *in client certificate. This sounds incorrect. My understanding is that the *server* host name is always matched with the *server* common name. When sslmode is set to “verify-ca”, libpq will verify that the client is trustworthy by checking the certificate trust chain up to the root certificate and it does not verify *server hostname*and certificate*common name *match in this case. Similarly, libpq will verify the *server* is trustworthy by checking the *server* certificate up to the root. It does not verify that the host name matches the common name in the *server* certificate. In all cases, libpq is responsible for verifying the *server* is who it claims to be. -- Chris
Re: Symbolic names for the values of typalign and typstorage
Alvaro Herrera writes: > On 2020-Mar-02, Tom Lane wrote: >> One thing that I'm not totally happy about, as this stands, is that >> we have to #include "catalog/pg_type.h" in various places we did >> not need to before (although only a fraction of the files I touched >> need that). > If we think that pg_type.h is the header to handle access to the pg_type > catalog, then I would think that the function declarations at the bottom > should be in some "internal" header file; then we can get rid of most > the #includes in pg_type.h. Well, aside from indirect inclusions, pg_type.h also brings in a bunch of type OID macros, which I feel we don't want to broadcast everywhere. One argument in favor of sticking these new macros somewhere "more central" is that they apply to both pg_type and pg_attribute (that is, attalign and attstorage also use them). That's not a strong argument, maybe, but it's something. regards, tom lane
Re: Fastpath while arranging the changes in LSN order in logical decoding
On Mon, Mar 2, 2020 at 7:27 PM David Steele wrote: > > Hi Dilip, > > On 2/18/20 7:30 PM, David Zhang wrote: > > After manually applied the patch, a diff regenerated is attached. > > David's updated patch applies but all logical decoding regression tests > are failing on cfbot. > > Do you know when you will be able to supply an updated patch? I will try to send in a day or two. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On 2020/03/03 9:27, Kyotaro Horiguchi wrote: At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao wrote in Attached is the updated version of the patch. The previous patch used only pgstat_progress_update_param() even when updating multiple values. Since those updates are not atomic, this can cause readers of the values to see the intermediate states. To avoid this issue, the latest patch uses pgstat_progress_update_multi_param(), instead. Attached the updated version of the patch. Barring any objections, I plan to commit this patch. It is working as designed and the status names are fine to me. Thanks for the review! I pushed the patch. The last one comment from me. The newly defined symbols have inconsistent indents. Yes, I fixed that. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On 2020/03/02 19:27, Sergei Kornilov wrote: Hello I reviewed a recently published patch. Looks good for me. Thanks for the review! I pushed the patch. One small note: the values for the new definitions in progress.h seems not to be aligned vertically. However, pgindent doesn't objects. Yes, I fixed that. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Add LogicalTapeSetExtend() to logtape.c
On Fri, Feb 28, 2020 at 12:38:55PM -0800, Jeff Davis wrote: > On Fri, 2020-02-28 at 14:16 +0800, Adam Lee wrote: > > I noticed another difference, I was using palloc0(), which could be > > one of the > > reason, but not sure. > > I changed the palloc0()'s in your code to plain palloc(), and it didn't > make any perceptible difference. Still slower than the version I posted > that keeps the flexible array. > > Did you compare all 3? Master, with your patch, and with my patch? I'd > like to see if you're seeing the same thing that I am. > > > Tested your hashagg-20200226.patch on my laptop(Apple clang version > > 11.0.0), > > the average time is 25.9s: > > That sounds high -- my runs are about half that time. Is that with a > debug build or an optimized one? > > Regards, > Jeff Davis Yes, I was running a debug version. I usually do 'CFLAGS=-O0 -g3' '--enable-cassert' '--enable-debug'. Test with a general build: Master: 12729ms 12970ms 12999ms With my patch(a pointer): 12965ms 13273ms 13116ms With your patch(flexible array): 12906ms 12991ms 13043ms Not obvious I suppose, anyway, your patch looks good to me. -- Adam Lee
Re: SQL/JSON: functions
On 2020-03-03 00:24, Nikita Glukhov wrote: On 03.03.2020 2:12, Erik Rijkers wrote: On 2020-03-02 23:33, Nikita Glukhov wrote: Attached 42th version of the patches. 20200302/0001-Jsonpath-support-for-json-v42.patch + 20200302/0002-Add-common-SQL_JSON-clauses-v42.patch+ 20200302/0003-Add-invisible-coercion-form-v42.patch+ 20200302/0004-Add-function-formats-v42.patch + 20200302/0005-SQLJSON-constructors-v42.patch + 20200302/0006-IS-JSON-predicate-v42.patch + 20200302/0007-SQLJSON-query-functions-v42.patch+ Thanks -- those applied fine. Compiling, I get this error from the pg_stat_statements module (in contrib): -- [2020.03.03 02:22:20 json_functions/0] make contrib pg_stat_statements.c: In function ‘JumbleExpr’: pg_stat_statements.c:2933:29: error: ‘JsonExpr’ {aka ‘struct JsonExpr’} has no member named ‘raw_expr’ 2933 | JumbleExpr(jstate, jexpr->raw_expr); | ^~ make[1]: *** [pg_stat_statements.o] Error 1 make: *** [all-pg_stat_statements-recurse] Error 2 -- contrib make returned 2 - abort ../../src/Makefile.global:919: recipe for target 'pg_stat_statements.o' failed Makefile:93: recipe for target 'all-pg_stat_statements-recurse' failed pg_stat_statements.c: In function ‘JumbleExpr’: pg_stat_statements.c:2933:29: error: ‘JsonExpr’ {aka ‘struct JsonExpr’} has no member named ‘raw_expr’ 2933 | JumbleExpr(jstate, jexpr->raw_expr); | ^~ make[1]: *** [pg_stat_statements.o] Error 1 make: *** [install-pg_stat_statements-recurse] Error 2 -- contrib make install returned 2 - abort (so I've edited out pg_stat_statements from the contrib/Makefile and compiled a working server for further testing.) Thanks, Erik Rijkers
Re: ALTER tbl rewrite loses CLUSTER ON index
@cfbot: resending with only Amit's 0001, since Michael pushed a variation on 0002. -- Justin >From 865fc2713ad29d0f8c0f63609a7c15c83cfa5cfe Mon Sep 17 00:00:00 2001 From: Amit Langote Date: Thu, 6 Feb 2020 18:14:16 +0900 Subject: [PATCH v5] ALTER tbl rewrite loses CLUSTER ON index --- src/backend/commands/tablecmds.c | 39 +++ src/test/regress/expected/alter_table.out | 34 src/test/regress/sql/alter_table.sql | 16 +- 3 files changed, 88 insertions(+), 1 deletion(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 02a7c04fdb..6b2469bd09 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, static void RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid, Relation rel, List *domname, const char *conname); +static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); static void TryReuseForeignKey(Oid oldId, Constraint *con); static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName, @@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, newcmd->def = (Node *) stmt; tab->subcmds[AT_PASS_OLD_INDEX] = lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd); + + /* Preserve index's indisclustered property, if set. */ + PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId); } else if (IsA(stm, AlterTableStmt)) { @@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, rel, NIL, indstmt->idxname); + + /* Preserve index's indisclustered property, if set. */ + PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid); } else if (cmd->subtype == AT_AddConstraint) { @@ -11996,6 +12003,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid, tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd); } +/* + * For a table's index that is to be recreated due to PostAlterType + * processing, preserve its indisclustered property by issuing ALTER TABLE + * CLUSTER ON command on the table that will run after the command to recreate + * the index. + */ +static void +PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid) +{ + HeapTuple indexTuple; + Form_pg_index indexForm; + + Assert(OidIsValid(indoid)); + Assert(pass == AT_PASS_OLD_INDEX); + + indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indoid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + + if (indexForm->indisclustered) + { + AlterTableCmd *newcmd = makeNode(AlterTableCmd); + + newcmd->subtype = AT_ClusterOn; + newcmd->name = get_rel_name(indoid); + tab->subcmds[pass] = lappend(tab->subcmds[pass], newcmd); + } + + ReleaseSysCache(indexTuple); +} + /* * Subroutine for ATPostAlterTypeParse(). Calls out to CheckIndexCompatible() * for the real analysis, then mutates the IndexStmt based on that verdict. diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index fb6d86a269..a01c6d6ec5 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -4296,3 +4296,37 @@ create trigger xtrig update bar1 set a = a + 1; INFO: a=1, b=1 /* End test case for bug #16242 */ +-- alter type rewrite/rebuild should preserve cluster marking on index +create table alttype_cluster (a int); +create index alttype_cluster_a on alttype_cluster (a); +alter table alttype_cluster cluster on alttype_cluster_a; +select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass; + indisclustered + + t +(1 row) + +alter table alttype_cluster alter a type bigint; +select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass; + indisclustered + + t +(1 row) + +drop index alttype_cluster_a; +alter table alttype_cluster add primary key (a); +alter table alttype_cluster cluster on alttype_cluster_pkey; +select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass; + indisclustered + + t +(1 row) + +alter table alttype_cluster alter a type int; +select indisclustered from pg_index where indrelid = 'alttype_cluster'::regclass; + indisclustered + + t +(1 row) + +drop table alttype_cluster; diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql index 3801f19c58..6e9048bbec 100644 --- a/src/test/regress/sql/alter_table.sql +++ b/src/test/regress/sql/alter_table.sql @@ -2802,7 +2802,6 @@ drop table at_test_sql_partop; drop operator class at_test_sql_partop using btree; drop function at_test_sql_partop;
Re: ALTER tbl rewrite loses CLUSTER ON index
On Mon, Mar 02, 2020 at 12:28:18PM +0900, Michael Paquier wrote: > > +SELECT indexrelid::regclass FROM pg_index WHERE > > indrelid='concur_clustered'::regclass; > > This test should check after indisclustered. Except that, the patch > is fine so I'll apply it if there are no objections. Oops - I realized that, but didn't send a new patch before you noticed - thanks for handling it. -- Justin
Re: Symbolic names for the values of typalign and typstorage
On 2020-Mar-02, Tom Lane wrote: > While looking at Tomas' ALTER TYPE patch, I got annoyed by the fact > that all of the backend writes constants of type alignment and type > storage values as literal characters, such as 'i' and 'x'. This is > not our style for most other "poor man's enum" catalog columns, and > it makes it really hard to grep for relevant code. Hence, attached > is a proposed patch to invent #define names for those values. Makes sense. > As is our custom for other similar catalog columns, I only used the > macros in C code. There are some references in SQL code too, > particularly in the regression tests, but the difficulty of replacing > symbolic references in SQL code seems more than it's worth to fix. Agreed. > One thing that I'm not totally happy about, as this stands, is that > we have to #include "catalog/pg_type.h" in various places we did > not need to before (although only a fraction of the files I touched > need that). Part of the issue is that I used the TYPALIGN_XXX > macros in tupmacs.h, but did not #include pg_type.h there because > I was concerned about macro inclusion bloat. Plausible alternatives > to the way I did it here include > > * just bite the bullet and #include pg_type.h in tupmacs.h; I like this one the most -- better than the alternative in the patch -- because it's the most honest IMO, except that there seems to be altogether too much cruft in pg_type.h that should be elsewhere (particularly nodes/nodes.h, which includes a large number of other headers). If we think that pg_type.h is the header to handle access to the pg_type catalog, then I would think that the function declarations at the bottom should be in some "internal" header file; then we can get rid of most the #includes in pg_type.h. > Thoughts? Anybody want to say that this is more code churn > than it's worth? It seems worthy cleanup to me. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()
On Sun, Mar 1, 2020 at 12:15 PM Floris Van Nee wrote: > Minor conflict with that patch indeed. Attached is rebased version. I'm > running some tests now - will post the results here when finished. Thanks. We're going to have to go back to my original approach to inlining. At least, it seemed to be necessary to do that to get any benefit from the patch on my comparatively modest workstation (using a similar pgbench SELECT benchmark to the one that you ran). Tom also had a concern about the portability of inlining without using "static inline" -- that is another reason to avoid the approach to inlining taken by v3 + v4. It's possible (though not very likely) that performance has been impacted by the deduplication patch (commit 0d861bbb), since it updated the definition of BTreeTupleGetNAtts() itself. Attached is v5, which inlines in a targeted fashion, pretty much in the same way as the earliest version. This is the same as v4 in every other way. Perhaps you can test this. -- Peter Geoghegan v5-0001-Avoid-pipeline-stall-in-_bt_compare.patch Description: Binary data v5-0002-Inline-_bt_compare.patch Description: Binary data
Re: ALTER tbl rewrite loses CLUSTER ON index
On Mon, Mar 02, 2020 at 12:28:18PM +0900, Michael Paquier wrote: > This test should check after indisclustered. Except that, the patch > is fine so I'll apply it if there are no objections. I got a second look at this one, and applied it down to 12 after some small modifications in the new test and in the comments. -- Michael signature.asc Description: PGP signature
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
At Mon, 2 Mar 2020 17:29:30 +0900, Fujii Masao wrote in > > Attached is the updated version of the patch. > > The previous patch used only pgstat_progress_update_param() > > even when updating multiple values. Since those updates are > > not atomic, this can cause readers of the values to see > > the intermediate states. To avoid this issue, the latest patch > > uses pgstat_progress_update_multi_param(), instead. > > Attached the updated version of the patch. > Barring any objections, I plan to commit this patch. It is working as designed and the status names are fine to me. The last one comment from me. The newly defined symbols have inconsistent indents. === #define PROGRESS_BASEBACKUP_PHASE 0 #define PROGRESS_BASEBACKUP_BACKUP_TOTAL1 #define PROGRESS_BASEBACKUP_BACKUP_STREAMED 2 #define PROGRESS_BASEBACKUP_TBLSPC_TOTAL3 #define PROGRESS_BASEBACKUP_TBLSPC_STREAMED 4 /* Phases of pg_basebackup (as advertised via PROGRESS_BASEBACKUP_PHASE) */ #define PROGRESS_BASEBACKUP_PHASE_WAIT_CHECKPOINT 1 #define PROGRESS_BASEBACKUP_PHASE_ESTIMATE_BACKUP_SIZE 2 #define PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP 3 #define PROGRESS_BASEBACKUP_PHASE_WAIT_WAL_ARCHIVE 4 #define PROGRESS_BASEBACKUP_PHASE_TRANSFER_WAL 5 regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Some improvements to numeric sqrt() and ln()
Dear Dean, On 2020-03-01 20:47, Dean Rasheed wrote: On Fri, 28 Feb 2020 at 08:15, Dean Rasheed wrote: It's possible that there are further gains to be had in the sqrt() algorithm on platforms that support 128-bit integers, but I haven't had a chance to investigate that yet. Rebased patch attached, now using 128-bit integers for part of sqrt_var() on platforms that support them. This turned out to be well worth it (1.5 to 2 times faster than the previous version if the result has less than 30 or 40 digits). Thank you for these patches, these sound like really nice improvements. One thing can to my mind while reading the patch: +* If r < 0 Then +* Let r = r + 2*s - 1 +* Let s = s - 1 + /* s is too large by 1; let r = r + 2*s - 1 and s = s - 1 */ + r_int64 += 2 * s_int64 - 1; + s_int64--; This can be reformulated as: +* If r < 0 Then +* Let r = r + s +* Let s = s - 1 +* Let r = r + s + /* s is too large by 1; let r = r + 2*s - 1 and s = s - 1 */ + r_int64 += s_int64; + s_int64--; + r_int64 += s_int64; which would remove one mul/shift and the temp. variable. Mind you, I have not benchmarked this, so it might make little difference, but maybe it is worth trying it. Best regards, Telsdiff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index 10229eb..9e6bb80 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -393,16 +393,6 @@ static const NumericVar const_ten = #endif #if DEC_DIGITS == 4 -static const NumericDigit const_zero_point_five_data[1] = {5000}; -#elif DEC_DIGITS == 2 -static const NumericDigit const_zero_point_five_data[1] = {50}; -#elif DEC_DIGITS == 1 -static const NumericDigit const_zero_point_five_data[1] = {5}; -#endif -static const NumericVar const_zero_point_five = -{1, -1, NUMERIC_POS, 1, NULL, (NumericDigit *) const_zero_point_five_data}; - -#if DEC_DIGITS == 4 static const NumericDigit const_zero_point_nine_data[1] = {9000}; #elif DEC_DIGITS == 2 static const NumericDigit const_zero_point_nine_data[1] = {90}; @@ -518,6 +508,8 @@ static void div_var_fast(const NumericVa static int select_div_scale(const NumericVar *var1, const NumericVar *var2); static void mod_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result); +static void div_mod_var(const NumericVar *var1, const NumericVar *var2, + NumericVar *quot, NumericVar *rem); static void ceil_var(const NumericVar *var, NumericVar *result); static void floor_var(const NumericVar *var, NumericVar *result); @@ -7712,6 +7704,7 @@ div_var_fast(const NumericVar *var1, con NumericVar *result, int rscale, bool round) { int div_ndigits; + int load_ndigits; int res_sign; int res_weight; int *div; @@ -7766,9 +7759,6 @@ div_var_fast(const NumericVar *var1, con div_ndigits += DIV_GUARD_DIGITS; if (div_ndigits < DIV_GUARD_DIGITS) div_ndigits = DIV_GUARD_DIGITS; - /* Must be at least var1ndigits, too, to simplify data-loading loop */ - if (div_ndigits < var1ndigits) - div_ndigits = var1ndigits; /* * We do the arithmetic in an array "div[]" of signed int's. Since @@ -7781,9 +7771,16 @@ div_var_fast(const NumericVar *var1, con * (approximate) quotient digit and stores it into div[], removing one * position of dividend space. A final pass of carry propagation takes * care of any mistaken quotient digits. + * + * Note that div[] doesn't necessarily contain all of the digits from the + * dividend --- the desired precision plus guard digits might be less than + * the dividend's precision. This happens, for example, in the square + * root algorithm, where we typically divide a 2N-digit number by an + * N-digit number, and only require a result with N digits of precision. */ div = (int *) palloc0((div_ndigits + 1) * sizeof(int)); - for (i = 0; i < var1ndigits; i++) + load_ndigits = Min(div_ndigits, var1ndigits); + for (i = 0; i < load_ndigits; i++) div[i + 1] = var1digits[i]; /* @@ -7844,9 +7841,15 @@ div_var_fast(const NumericVar *var1, con maxdiv += Abs(qdigit); if (maxdiv > (INT_MAX - INT_MAX / NBASE - 1) / (NBASE - 1)) { -/* Yes, do it */ +/* + * Yes, do it. Note that if var2ndigits is much smaller than + * div_ndigits, we can save a significant amount of effort + * here by noting that we only need to normalise those div[] + * entries touched where prior iterations subtracted multiples + * of the divisor. + */ carry = 0; -for (i = div_ndigits; i > qi; i--) +for (i = Min(qi + var2ndigits - 2, div_ndigits); i > qi; i--) { newdig = div[i] + carry; if
Re: Improve search for missing parent downlinks in amcheck
Hi Alexander, Apologies for the delayed response. I was a little tired from the deduplication project. On Mon, Feb 24, 2020 at 2:54 PM Alexander Korotkov wrote: > Thank you for your review. The revised version is attached. This has bitrot, because of the deduplication patch. Shouldn't be hard to rebase, though. > > 'foo, -inf' > > 'foo, (1,24)' > > 'food, -inf'. <-- This pivot tuple's downlink points to the final leaf > > page that's filled with duplicates of the value 'foo' > > 'food, (3,19)' <-- This pivot tuple's downlink points to the *first* > > leaf page that's filled with duplicates of the value 'food' > > ... > Thank you for the explanation! I taught pageinspect to display a "htid" field for pivot tuples recently, making it easier to visualize this example. I think that you should say more about how "lowkey" is used here: > /* > -* Record if page that is about to become target is the right half of > -* an incomplete page split. This can go stale immediately in > -* !readonly case. > +* Copy current target low key as the high key of right sibling. > +* Allocate memory in upper level context, so it would be cleared > +* after reset of target context. > +* > +* We only need low key for parent check. > */ > - state->rightsplit = P_INCOMPLETE_SPLIT(opaque); > + if (state->readonly && !P_RIGHTMOST(opaque)) > + { Say something about concurrent page splits, since they're the only case where we actually use lowkey. Maybe say something like: "We probably won't end up doing anything with lowkey, but it's simpler for readonly verification to always have it available". > Actually, lowkey is used for removing "cousin page verification blind > spot" when there are incomplete splits. It might happen that we read > child with hikey matching its parent high key only when > bt_child_highkey_check() is called for "minus infinity" key of parent > right sibling. Saving low key helps in this case. That makes sense to me. > It appears that these false positives were cause by very basic error made > here: > > if (!first && !P_IGNORE(opaque)) > { > /* blkno probably has missing parent downlink */ > bt_downlink_missing_check(state, rightsplit, blkno, page); > } > > actually it should be > > if (blkno != downlink && !P_IGNORE(opaque)) > { > /* blkno probably has missing parent downlink */ > bt_downlink_missing_check(state, rightsplit, blkno, page); > } > > So "blkno == downlink" means blkno has downlink, not being checked > first in the loop. This is remains of old version of patch which I > forget to clean. Now the check you've described works for me. > > If you still think lowkey check is a problem, please help me figure it out. * I think that these comments could still be clearer: > + /* > +* We're going to try match child high key to "negative > +* infinity key". This normally happens when the last child > +* we visited for target's left sibling was an incomplete > +* split. So, we must be still on the child of target's left > +* sibling. Thus, we should match to target's left sibling > +* high key. Thankfully we saved it, it's called a "low key". > +*/ Maybe start with "We cannot try to match child's high key to a negative infinity key in target, since there is nothing to compare. However...". Perhaps use terms like "cousin page" and "subtree", which can be useful. Alternatively, mention this case in the diagram example at the top of bt_child_highkey_check(). It's tough to write comments like this, but I think it's worth it. Note that a high key is also a pivot tuple, so I wouldn't mention high keys here: > +/* > + * Check if two tuples are binary identical except the block number. So, > + * this function is capable to compare high keys with pivot keys. > + */ > +static bool > +bt_pivot_tuple_identical(IndexTuple itup1, IndexTuple itup2) > +{ v7 looks pretty close to being commitable, though I'll probably want to update some comments that you haven't touched when you commit this. I should probably wait until you've committed the patch to go do that. I'm thinking of things like old comments in bt_downlink_check(). I will test the patch properly one more time when you produce a new revision. I haven't really tested it since the last time. -- Peter Geoghegan
Re: Internal key management system
Hi Masahiko Please see below my comments regarding kms_v4.patch that you have shared earlier. (1) There is a discrepancy between the documentation and the actual function definition. For example in func.sgml, it states pg_wrap_key takes argument in text data type but in pg_proc.dat and kmgr.c, the function is defined to take argument in bytea data type. ===> doc/src/sgml/func.sgml + + + pg_wrap_key + + pg_wrap_key(data text) + + + bytea + ===> src/include/catalog/pg_proc.dat +{ oid => '8201', descr => 'wrap the given secret', + proname => 'pg_wrap_key', + provolatile => 'v', prorettype => 'bytea', + proargtypes => 'bytea', prosrc => 'pg_wrap_key' }, ===> src/backend/crypto/kmgr.c +Datum +pg_wrap_key(PG_FUNCTION_ARGS) +{ + bytea *data = PG_GETARG_BYTEA_PP(0); + bytea *res; + int datalen; + int reslen; + int len; + (2) I think the documentation needs to make clear the difference between a key and a user secret. Some parts of it are trying to mix the 2 terms together when they shouldn't. To my understanding, a key is expressed as binary data that is actually used in the encryption and decryption operations. A user secret, on the other hand, is more like a passphrase, expressed as string, that is used to derive a encryption key for subsequent encryption operations. If I just look at the function names "pg_wrap_key" and "pg_unwrap_key", I immediately feel that these functions are used to encapsulate and uncover cryptographic key materials. The input and output to these 2 functions should all be key materials expressed in bytea data type. In previous email discussion, there was only one key material in discussion, called the master key (generated during initdb and stored in cluster), and this somehow automatically makes people (including myself) associate pg_wrap_key and pg_unwrap_key to be used on this master key and raise a bunch of security concerns around it. Having read the documentation provided by the patch describing pg_wrap_key and pg_unwrap_key, they seem to serve another purpose. It states that pg_wrap_key is used to encrypt a user-supplied secret (text) with the master key and produce a wrapped secret while pg_unwrap_key does the opposite, so we can prevent user from having to enter the secret in plaintext when using pgcrypto functions. This use case is ok but user secret is not really a cryptographic key material is it? It is more similar to a secret passphrase expressed in text and pg_wrap_key is merely used to turn the passphrase into a wrapped passphrase expressed in bytea. If I give pg_wrap_key with a real key material expressed in bytea, I will not be able to unwrap it properly: Select pg_unwrap_key (pg_wrap_key(E'\\x2b073713476f9f0e761e45c64be8175424d2742e7d53df90b6416f1d84168e8a') ); pg_unwrap_key -- +\x077\x13Go\x0Ev\x1EEK\x17T$t.}SߐAo\x1D\x16 (1 row) Maybe we should rename these SQL functions like this to prevent confusion. => pg_wrap_secret (takes a text, returns a bytea) => pg_unwrap_secret(takes a bytea, returns a text) if there is a use case for users to encapsulate key materials then we can define 2 more wrap functions for these, if there is no use case, don't bother: => pg_wrap_key (takes a bytea, returns a bytea) => pg_unwrap_key (takes a bytea, returns a bytea) (3) I would rephrase "chapter 32: Encryption Key Management Part III. Server Administration" documentation like this: = PostgreSQL supports Encryption Key Management System, which is enabled when PostgreSQL is built with --with-openssl option and cluster_passphrase_command is specified during initdb process. The user-provided cluster_passphrase_command in postgresql.conf and the cluster_passphrase_command specified during initdb process must match, otherwise, the database cluster will not start up. The user-provided cluster passphrase is derived into a Key Encryption Key (KEK), which is used to encapsulate the Master Encryption Key generated during the initdb process. The encapsulated Master Encryption Key is stored inside the database cluster. Encryption Key Management System provides several functions to allow users to use the master encryption key to wrap and unwrap their own encryption secrets during encryption and decryption operations. This feature allows users to encrypt and decrypt data without knowing the actual key. = (4) I would rephrase "chapter 32.2: Wrap and Unwrap user secret" documentation like this: Note that I rephrase based on (2) and uses pg_(un)wrap_secret instead. = Encryption key management System provides several functions described in Table 9.97, to
Re: Improving connection scalability: GetSnapshotData()
Hi, On 2020-03-01 00:36:01 -0800, Andres Freund wrote: > Here are some numbers for the submitted patch series. I'd to cull some > further improvements to make it more manageable, but I think the numbers > still are quite convincing. > > The workload is a pgbench readonly, with pgbench -M prepared -c $conns > -j $conns -S -n for each client count. This is on a machine with 2 > Intel(R) Xeon(R) Platinum 8168, but virtualized. > > conns tps mastertps pgxact-split > > 1 26842.49284526524.194821 > 10 246923.158682 249224.782661 > 50 695956.539704 709833.746374 > 100 1054727.043139 1903616.306028 > 200 964795.282957 1949200.338012 > 300 906029.377539 1927881.231478 > 400 845696.690912 1911065.369776 > 500 812295.222497 1926237.255856 > 600 888030.104213 1903047.236273 > 700 866896.532490 1886537.202142 > 800 863407.341506 1883768.592610 > 900 871386.608563 1874638.012128 > 1000887668.277133 1876402.391502 > 1500860051.361395 1815103.564241 > 2000890900.098657 1775435.271018 > 3000874184.980039 1653953.817997 > 4000845023.080703 1582582.316043 > 5000817100.195728 1512260.802371 > > I think these are pretty nice results. > One further cool recognition of the fact that GetSnapshotData()'s > results can be made to only depend on the set of xids in progress, is > that caching the results of GetSnapshotData() is almost trivial at that > point: We only need to recompute snapshots when a toplevel transaction > commits/aborts. > > So we can avoid rebuilding snapshots when no commt has happened since it > was last built. Which amounts to assigning a current 'commit sequence > number' to the snapshot, and checking that against the current number > at the time of the next GetSnapshotData() call. Well, turns out there's > this "LSN" thing we assign to commits (there are some small issues with > that though). I've experimented with that, and it considerably further > improves the numbers above. Both with a higher peak throughput, but more > importantly it almost entirely removes the throughput regression from > 2000 connections onwards. > > I'm still working on cleaning that part of the patch up, I'll post it in > a bit. I triggered a longer run on the same hardware, that also includes numbers for the caching patch. nclientsmaster pgxact-splitpgxact-split-cache 1 29742.80507429086.87440428120.709885 2 58653.00592156610.43291957343.937924 3 116580.383993 115102.94057117512.656103 4 150821.023662 154130.354635 152053.714824 5 186679.754357 189585.156519 191095.841847 6 219013.756252 223053.409306 224480.026711 7 256861.673892 256709.57311262427.179555 8 291495.547691 294311.524297 296245.219028 9 332835.641015 333223.666809 335460.280487 10 367883.74842373562.206447 375682.894433 15 561008.204553 578601.577916 587542.061911 20 748000.911053 794048.140682 810964.700467 25 904581.660543 1037279.089703 1043615.577083 30 999231.007768 1251113.123461 1288276.726489 35 1001274.289847 1438640.653822 1438508.432425 40 991672.445199 1518100.079695 1573310.171868 45 994427.395069 1575758.31948 1649264.339117 50 1017561.371878 1654776.716703 1715762.303282 60 993943.210188 1720318.989894 1789698.632656 70 971379.995255 1729836.303817 1819477.25356 80 966276.137538 1744019.347399 1842248.57152 90 901175.211649 1768907.069263 1847823.970726 100 803175.743261784636.397822 1865795.782943 125 664438.039582 1806275.514545 1870983.64688 150 623562.201749 1796229.009658 1876529.428419 175 680683.150597 1809321.487338 1910694.40987 200 668413.988251 1833457.942035 1878391.674828 225 682786.299485 1816577.462613 1884587.77743 250 727308.562076 1825796.324814 1864692.025853 275 676295.999761 1843098.107926 1908698.584573 300 698831.398432 1832068.168744 1892735.290045 400 661534.639489 1859641.983234 1898606.247281 500 645149.788352 1851124.475202 1888589.134422 600 740636.323211 1875152.669115 1880653.747185 700 858645.363292 1833527.505826 1874627.969414 800 858287.957814 1841914.668668 1892106.319085 900 882204.933544 1850998.221969 1868260.041595 1000910988.551206 1836336.091652 1862945.18557 1500917727.928271808822.338465 1864150.00307 2000982137.053108 1813070.209217 1877104.342864 30001013514.639108 1753026.733843 1870416.924248 40001025476.80688 1600598.543635 1859908.314496 50001019889.160511 1534501.389169 1870132.571895 7500968558.864242 1352137.828569 1853825.376742 1 887558.112017
Re: SQL/JSON: functions
On 2020-03-02 23:33, Nikita Glukhov wrote: Attached 42th version of the patches. v1-0001-Add-jsonpath-pg-modifier-for-enabling-extensions.patch v1-0002-Add-raw-jbvArray-and-jbvObject-support-to-jsonpat.patch v1-0003-Add-jsonpath-sequence-constructors.patch v1-0004-Add-jsonpath-array-constructors.patch v1-0005-Add-jsonpath-object-constructors.patch v1-0006-Add-jsonpath-object-subscripting.patch I can't get these to apply against master. $ patch --dry-run -b -l -F 15 -p 1 < 0001-Jsonpath-support-for-json-v42.patch can't find file to patch at input line 38 (tried -p o and -p 1 -- am I doing it wrong?) Maybe it needs something else applied first? Erik Rijkers
Symbolic names for the values of typalign and typstorage
While looking at Tomas' ALTER TYPE patch, I got annoyed by the fact that all of the backend writes constants of type alignment and type storage values as literal characters, such as 'i' and 'x'. This is not our style for most other "poor man's enum" catalog columns, and it makes it really hard to grep for relevant code. Hence, attached is a proposed patch to invent #define names for those values. As is our custom for other similar catalog columns, I only used the macros in C code. There are some references in SQL code too, particularly in the regression tests, but the difficulty of replacing symbolic references in SQL code seems more than it's worth to fix. One thing that I'm not totally happy about, as this stands, is that we have to #include "catalog/pg_type.h" in various places we did not need to before (although only a fraction of the files I touched need that). Part of the issue is that I used the TYPALIGN_XXX macros in tupmacs.h, but did not #include pg_type.h there because I was concerned about macro inclusion bloat. Plausible alternatives to the way I did it here include * just bite the bullet and #include pg_type.h in tupmacs.h; * keep using the hard-coded values in tupmacs.h (with a comment as to why); * put the TYPALIGN_XXX #defines somewhere else (not clear where, but there might be a case for postgres.h, since so much of the backend has some interest in alignment). Thoughts? Anybody want to say that this is more code churn than it's worth? regards, tom lane diff --git a/contrib/hstore/hstore_gin.c b/contrib/hstore/hstore_gin.c index 4c3a422..9085302 100644 --- a/contrib/hstore/hstore_gin.c +++ b/contrib/hstore/hstore_gin.c @@ -119,7 +119,7 @@ gin_extract_hstore_query(PG_FUNCTION_ARGS) text *item; deconstruct_array(query, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); entries = (Datum *) palloc(sizeof(Datum) * key_count); diff --git a/contrib/hstore/hstore_gist.c b/contrib/hstore/hstore_gist.c index e860f1e..d198c4b 100644 --- a/contrib/hstore/hstore_gist.c +++ b/contrib/hstore/hstore_gist.c @@ -555,7 +555,7 @@ ghstore_consistent(PG_FUNCTION_ARGS) int i; deconstruct_array(query, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); for (i = 0; res && i < key_count; ++i) @@ -578,7 +578,7 @@ ghstore_consistent(PG_FUNCTION_ARGS) int i; deconstruct_array(query, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); res = false; diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c index f3174f2..60bdbea 100644 --- a/contrib/hstore/hstore_io.c +++ b/contrib/hstore/hstore_io.c @@ -560,7 +560,7 @@ hstore_from_arrays(PG_FUNCTION_ARGS) errmsg("wrong number of array subscripts"))); deconstruct_array(key_array, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); /* see discussion in hstoreArrayToPairs() */ @@ -599,7 +599,7 @@ hstore_from_arrays(PG_FUNCTION_ARGS) errmsg("arrays must have same bounds"))); deconstruct_array(value_array, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); Assert(key_count == value_count); @@ -689,7 +689,7 @@ hstore_from_array(PG_FUNCTION_ARGS) } deconstruct_array(in_array, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); count = in_count / 2; diff --git a/contrib/hstore/hstore_op.c b/contrib/hstore/hstore_op.c index fb1bb06..dd79d01 100644 --- a/contrib/hstore/hstore_op.c +++ b/contrib/hstore/hstore_op.c @@ -81,7 +81,7 @@ hstoreArrayToPairs(ArrayType *a, int *npairs) j; deconstruct_array(a, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); if (key_count == 0) @@ -583,7 +583,7 @@ hstore_slice_to_array(PG_FUNCTION_ARGS) int i; deconstruct_array(key_array, - TEXTOID, -1, false, 'i', + TEXTOID, -1, false, TYPALIGN_INT, _datums, _nulls, _count); if (key_count == 0) @@ -623,7 +623,7 @@ hstore_slice_to_array(PG_FUNCTION_ARGS) ARR_NDIM(key_array), ARR_DIMS(key_array), ARR_LBOUND(key_array), - TEXTOID, -1, false, 'i'); + TEXTOID, -1, false, TYPALIGN_INT); PG_RETURN_POINTER(aout); } @@ -720,7 +720,7 @@ hstore_akeys(PG_FUNCTION_ARGS) } a = construct_array(d, count, - TEXTOID, -1, false, 'i'); + TEXTOID, -1, false, TYPALIGN_INT); PG_RETURN_POINTER(a); } @@ -767,7 +767,7 @@ hstore_avals(PG_FUNCTION_ARGS) } a = construct_md_array(d, nulls, 1, , , - TEXTOID, -1, false, 'i'); + TEXTOID, -1, false, TYPALIGN_INT); PG_RETURN_POINTER(a); } @@ -819,7 +819,7 @@ hstore_to_array_internal(HStore *hs, int ndims)
Re: Portal->commandTag as an enum
> On Mar 2, 2020, at 1:57 PM, Alvaro Herrera wrote: > > On 2020-Mar-02, Mark Dilger wrote: > >> I think it is more natural to change event trigger code to reason >> natively about CommandTags rather than continuing to reason about >> strings. The conversion back-and-forth between the enum and the >> string representation serves no useful purpose that I can see. But I >> understand if you are just trying to have the patch change fewer parts >> of the code, and if you feel more comfortable about reverting that >> part of the patch, as the committer, I think that's your prerogative. > > Nah -- after reading it again, that would make no sense. With the > change, we're forcing all writers of event trigger functions in C to > adapt their functions to the new API, but that's okay -- I don't expect > there will be many, and we're doing other things to the API anyway. > > I pushed it now. Thanks! I greatly appreciate your time. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Allowing ALTER TYPE to change storage strategy
On Mon, Mar 02, 2020 at 02:11:10PM -0500, Tom Lane wrote: I started to look at this patch with fresh eyes after reading the patch for adding binary I/O for ltree, https://www.postgresql.org/message-id/flat/CANmj9Vxx50jOo1L7iSRxd142NyTz6Bdcgg7u9P3Z8o0=hgk...@mail.gmail.com and realizing that the only reasonable way to tackle that problem is to provide an ALTER TYPE command that can set the binary I/O functions for an existing type. (One might think that it'd be acceptable to UPDATE the pg_type row directly; but that wouldn't take care of dependencies properly, and it also wouldn't handle the domain issues I discuss below.) There are other properties too that can be set in CREATE TYPE but we have no convenient way to adjust later, though it'd be reasonable to want to do so. I do not think that we want to invent bespoke syntax for each property. The more such stuff we cram into ALTER TYPE, the bigger the risk of conflicting with some future SQL extension. Moreover, since CREATE TYPE for base types already uses the "( keyword = value, ... )" syntax for these properties, and we have a similar precedent in CREATE/ALTER OPERATOR, it seems to me that the syntax we want here is ALTER TYPE typename SET ( keyword = value [, ... ] ) Agreed, it seems reasonable to use the ALTER OPRATOR precedent. Attached is a stab at doing it that way, and implementing setting of the binary I/O functions for good measure. (It'd be reasonable to add more stuff, like setting the other support functions, but this is enough for the immediate discussion.) The main thing I'm not too happy about is what to do about domains. Your v2 patch supposed that it's okay to allow ALTER TYPE on domains, but I'm not sure we want to go that way, and if we do there's certainly a bunch more work that has to be done. Up to now the system has supposed that domains inherit all these properties from their base types. I'm not certain exactly how far that assumption has propagated, but there's at least one place that implicitly assumes it: pg_dump has no logic for adjusting a domain to have different storage or support functions than the base type had. So as v2 stands, a custom storage option on a domain would be lost in dump/reload. Another issue that would become a big problem if we allow domains to have custom I/O functions is that the wire protocol transmits the base type's OID, not the domain's OID, for an output column that is of a domain type. A client that expected a particular output format on the strength of what it was told the column type was would be in for a big surprise. Certainly we could fix pg_dump if we had a mind to, but changing the wire protocol for this would have unpleasant ramifications. And I'm worried about whether there are other places in the system that are also making this sort of assumption. I'm also not very convinced that we *want* to allow domains to vary from their base types in this way. The primary use-case I can think of for ALTER TYPE SET is in extension update scripts, and an extension would almost surely wish for any domains over its type to automatically absorb whatever changes of this sort it wants to make. So I think there are two distinct paths we could take here: * Decide that it's okay to allow domains to vary from their base type in these properties. Teach pg_dump to cope with that, and stand ready to fix any other bugs we find, and have some story to tell the people whose clients we break. Possibly add a CASCADE option to ALTER TYPE SET, with the semantics of adjusting dependent domains to match. (This is slightly less scary than the CASCADE semantics you had in mind, because it would only affect pg_type entries not tables.) * Decide that we'll continue to require domains to match their base type in all these properties. That means refusing to allow ALTER on a domain per se, and automatically cascading these changes to dependent domains. In the v3 patch below, I've ripped out the ALTER DOMAIN syntax on the assumption that we'd do the latter; but I've not written the cascade recursion logic, because that seemed like a lot of work to do in advance of having consensus on it being a good idea. I do agree we should do the latter, i.e. maintain the assumption that domains have the same properties as their base type. I can't think of a use case for allowing them to differ, it just didn't occur to me there is this implicit assumption when writing the patch. I've also restricted the code to work just on base types, because it's far from apparent to me that it makes any sense to allow any of these operations on derived types such as composites or ranges. Again, there's a fair amount of code that is not going to be prepared for such a type to have properties that it could not have at creation, and I don't see a use-case that really justifies breaking those expectations. Yeah, that makes sense too, I think. regards -- Tomas Vondra http://www.2ndQuadrant.com
Re: Portal->commandTag as an enum
On 2020-Mar-02, Mark Dilger wrote: > I think it is more natural to change event trigger code to reason > natively about CommandTags rather than continuing to reason about > strings. The conversion back-and-forth between the enum and the > string representation serves no useful purpose that I can see. But I > understand if you are just trying to have the patch change fewer parts > of the code, and if you feel more comfortable about reverting that > part of the patch, as the committer, I think that's your prerogative. Nah -- after reading it again, that would make no sense. With the change, we're forcing all writers of event trigger functions in C to adapt their functions to the new API, but that's okay -- I don't expect there will be many, and we're doing other things to the API anyway. I pushed it now. I made very small changes before pushing: notably, I removed the InitializeQueryCompletion() call from standard_ProcessUtility; instead its callers are supposed to do it. Updated ProcessUtility's comment to that effect. Also, the affected plancache.c functions (CreateCachedPlan and CreateOneShotCachedPlan) had not had their comments updated. Previously they received compile-time constants, but that was important because they were strings. No longer. I noticed some other changes that could perhaps be made here, but didn't do them; for instance, in pg_stat_statement we have a comparison to CMDTAG_COPY that seems pointless; we could just acquire the value always. I left it alone for now but I think the change is without side effects (notably so because most actual DML does not go through ProcessUtility anyway). Also, event_trigger.c could resolve command strings to the tag enum earlier. There's also a lot of nonsense in the pquery.c functions, such as this, /* * Now fetch desired portion of results. */ nprocessed = PortalRunSelect(portal, true, count, dest); /* * If the portal result contains a command tag and the caller * gave us a pointer to store it, copy it and update the * rowcount. */ if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN) { CopyQueryCompletion(qc, >qc); qc->nprocessed = nprocessed; } I think we could simplify that by passing the qc. Similar consideration with DoCopy; instead of a 'uint64 nprocessed' we could have a *qc to fill in and avoid this bit of silliness, DoCopy(pstate, (CopyStmt *) parsetree, pstmt->stmt_location, pstmt->stmt_len, ); if (qc) SetQueryCompletion(qc, CMDTAG_COPY, processed); I see no reason to have PortalRun() initialize the qc; ISTM that its callers should do so. And so on. Nothing of that is critical. Thanks for your patch, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] kNN for btree
On Tue, Dec 3, 2019 at 3:00 AM Alexander Korotkov wrote: > On Mon, Sep 9, 2019 at 11:28 PM Alexander Korotkov > wrote: > > On Sun, Sep 8, 2019 at 11:35 PM Alexander Korotkov > > wrote: > > > I'm going to push 0001 changing "attno >= 1" to assert. > > > > Pushed. Rebased patchset is attached. I propose to limit > > consideration during this commitfest to this set of 7 remaining > > patches. The rest of patches could be considered later. I made some > > minor editing in preparation to commit. But I decided I've couple > > more notes to Nikita. > > > > * 0003 extracts part of fields from BTScanOpaqueData struct into new > > BTScanStateData struct. However, there is a big comment regarding > > BTScanOpaqueData just before definition of BTScanPosItem. It needs to > > be revised. > > * 0004 adds "knnState" field to BTScanOpaqueData in addition to > > "state" field. Wherein "knnState" might unused during knn scan if it > > could be done in one direction. This seems counter-intuitive. Could > > we rework this to have "forwardState" and "backwardState" fields > > instead of "state" and "knnState"? > > I have reordered patchset into fewer more self-consistent patches. > > Besides comments, code beautification and other improvements, now > there are dedicated fields for forward and backward scan states. The > forward scan state is the pointer to data structure, which is used in > ordinal unidirectional scan. So, it's mostly cosmetic change, but it > improves the code readability. > > One thing bothers me. We have to move scalar distance operators from > btree_gist to core. btree_gist extension upgrade script have to > qualify operators with schema in order to distinguish core and > extension implementations. So, I have to use @extschema@. But it's > forbidden for relocatable extensions. For now, I marken btree_gist as > non-relocatable. Our comment in extension.c says "For a relocatable > extension, we needn't do this. There cannot be any need for > @extschema@, else it wouldn't be relocatable.". Is it really true? I > think now we have pretty valid example for relocatable extension, > which needs @extschema@ in upgrade script. Any thoughts? I've rebased the patchset to the current master and made some refactoring. I hope it would be possible to bring it to committable shape during this CF. This need more refactoring though. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Introduce-IndexAmRoutine.ammorderbyopfirstcol-v17.patch.gz Description: GNU Zip compressed data 0002-Allow-ordering-by-operator-in-ordered-indexes-v17.patch.gz Description: GNU Zip compressed data 0004-Move-scalar-distance-operators-from-btree_gist-t-v17.patch.gz Description: GNU Zip compressed data 0003-Extract-BTScanState-from-BTScanOpaqueData-v17.patch.gz Description: GNU Zip compressed data 0005-Add-knn-support-to-btree-indexes-v17.patch.gz Description: GNU Zip compressed data
Re: Use compiler intrinsics for bit ops in hash
Hi David, On Wed, Feb 26, 2020 at 9:56 PM David Fetter wrote: > > On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote: > > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote: > > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > > > > > > > 1. Is ceil_log2_64 dead code? > > > > > > > > > > Let's call it nascent code. I suspect there are places it could go, if > > > > > I look for them. Also, it seemed silly to have one without the other. > > > > > > > > > > > > > While not absolutely required, I'd like us to find at least one > > > > place and start using it. (Clang also nags at me when we have > > > > unused functions). > > > > > > Done in the expanded patches attached. I see that you've found use of it in dynahash, thanks! The math in the new (from v4 to v6) patch is wrong: it yields ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted the restriction of "num greater than one" for ceil_log2() in this patch set, but it's now _more_ problematic to base those functions on pg_leftmost_one_pos(). I'm not comfortable with your changes to pg_leftmost_one_pos() to remove the restriction on word being non-zero. Specifically pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its current callers (in HEAD) is harmed, this introduces muddy semantics: 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning for a set bit in a zero word won't find it anywhere. 2. we can _try_ generalizing it to accommodate ceil_log2 by extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In that case, the extrapolation yields -1 for pg_leftmost_one_pos(0). I'm not convinced that others on the list will be comfortable with the generalization suggested in 2 above. I've quickly put together a PoC patch on top of yours, which re-implements ceil_log2 using LZCNT coupled with a CPUID check. Thoughts? Cheers, Jesse From 0e4392d77b6132a508b7da14871cc99066a9d114 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Fri, 28 Feb 2020 16:22:04 -0800 Subject: [PATCH] Use LZCOUNT when possible This patch reverts the changes to pg_leftmost_one and friends (which is really bit-scan-reverse, and is legitimately undefined one zero) and reworks ceil_log2 and friends to use a count-leading-zeros operation that is well-defined on zero. --- src/include/port/pg_bitutils.h | 47 - src/port/pg_bitutils.c | 77 ++ 2 files changed, 104 insertions(+), 20 deletions(-) diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h index 88a9ea5b7fb..b4d5724ee1d 100644 --- a/src/include/port/pg_bitutils.h +++ b/src/include/port/pg_bitutils.h @@ -20,18 +20,19 @@ extern PGDLLIMPORT const uint8 pg_number_of_ones[256]; /* * pg_leftmost_one_pos32 * Returns the position of the most significant set bit in "word", - * measured from the least significant bit. + * measured from the least significant bit. word must not be 0. */ static inline int pg_leftmost_one_pos32(uint32 word) { #ifdef HAVE__BUILTIN_CLZ - return word ? 31 - __builtin_clz(word) : 0; + Assert(word != 0); + + return 31 - __builtin_clz(word); #else int shift = 32 - 8; - if (word == 0) - return 0; + Assert(word != 0); while ((word >> shift) == 0) shift -= 8; @@ -48,18 +49,19 @@ static inline int pg_leftmost_one_pos64(uint64 word) { #ifdef HAVE__BUILTIN_CLZ + Assert(word != 0); + #if defined(HAVE_LONG_INT_64) - return word ? 63 - __builtin_clzl(word) : 0; + return 63 - __builtin_clzl(word); #elif defined(HAVE_LONG_LONG_INT_64) - return word ? 63 - __builtin_clzll(word) : 0; + return 63 - __builtin_clzll(word); #else #error must have a working 64-bit integer datatype #endif #else /* !HAVE__BUILTIN_CLZ */ int shift = 64 - 8; - if (word == 0) - return 0; + Assert(word != 0); while ((word >> shift) == 0) shift -= 8; @@ -71,18 +73,19 @@ pg_leftmost_one_pos64(uint64 word) /* * pg_rightmost_one_pos32 * Returns the position of the least significant set bit in "word", - * measured from the least significant bit. + * measured from the least significant bit. word must not be 0. */ static inline int pg_rightmost_one_pos32(uint32 word) { #ifdef HAVE__BUILTIN_CTZ - return word ? __builtin_ctz(word) : 32; + Assert(word != 0); + + return __builtin_ctz(word); #else int result = 0; - if (word == 0) - return 32; + Assert(word != 0); while ((word & 255) == 0) { @@ -102,18 +105,19 @@ static inline int pg_rightmost_one_pos64(uint64 word) { #ifdef HAVE__BUILTIN_CTZ + Assert(word != 0); + #if defined(HAVE_LONG_INT_64) - return word ? __builtin_ctzl(word) : 64; + return __builtin_ctzl(word); #elif defined(HAVE_LONG_LONG_INT_64) - return word ?
Re: [PATCH] Erase the distinctClause if the result is unique by definition
On Tue, Mar 3, 2020 at 1:24 AM Andy Fan wrote: > Thank you Tom for the review! > > On Mon, Mar 2, 2020 at 4:46 AM Tom Lane wrote: > >> Andy Fan writes: >> > Please see if you have any comments. Thanks >> >> The cfbot isn't at all happy with this. Its linux build is complaining >> about a possibly-uninitialized variable, and then giving up: >> >> https://travis-ci.org/postgresql-cfbot/postgresql/builds/656722993 >> >> The Windows build isn't using -Werror, but it is crashing in at least >> two different spots in the regression tests: >> >> >> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81778 >> >> I've not attempted to identify the cause of that. >> >> > Before I submit the patch, I can make sure "make check-world" is > successful, but > since the compile option is not same, so I didn't catch > the possibly-uninitialized > variable. As for the crash on the windows, I didn't get the enough > information > now, I will find a windows server and reproduce the cases. > > I just found the link http://commitfest.cputube.org/ this morning, I will > make sure > the next patch can go pass this test. > > > At a high level, I'm a bit disturbed that this focuses only on DISTINCT >> and doesn't (appear to) have any equivalent intelligence for GROUP BY, >> though surely that offers much the same opportunities for optimization. >> It seems like it'd be worthwhile to take a couple steps back and see >> if we couldn't recast the logic to work for both. >> >> > OK, Looks group by is a bit harder than distinct since the aggregation > function. > I will go through the code to see why to add this logic. > > Can we grantee any_aggr_func(a) == a if only 1 row returned, if so, we can do some work on the pathtarget/reltarget by transforming the Aggref to raw expr. I checked the execution path of the aggregation call, looks it depends on Agg node which is the thing we want to remove. > >> * There seem to be some pointless #include additions, eg in planner.c >> the added code doesn't look to justify any of them. Please also >> avoid unnecessary whitespace changes, those also slow down reviewing. >> > fixed some typo errors. That may be because I added the header file at time 1 and then refactored the code but forget to remove the header file when it is not necessary. Do we need to relay on experience to tell if the header file is needed or not, or do we have any tool to tell it automatically? regards, Andy Fan
Re: Allowing ALTER TYPE to change storage strategy
I started to look at this patch with fresh eyes after reading the patch for adding binary I/O for ltree, https://www.postgresql.org/message-id/flat/CANmj9Vxx50jOo1L7iSRxd142NyTz6Bdcgg7u9P3Z8o0=hgk...@mail.gmail.com and realizing that the only reasonable way to tackle that problem is to provide an ALTER TYPE command that can set the binary I/O functions for an existing type. (One might think that it'd be acceptable to UPDATE the pg_type row directly; but that wouldn't take care of dependencies properly, and it also wouldn't handle the domain issues I discuss below.) There are other properties too that can be set in CREATE TYPE but we have no convenient way to adjust later, though it'd be reasonable to want to do so. I do not think that we want to invent bespoke syntax for each property. The more such stuff we cram into ALTER TYPE, the bigger the risk of conflicting with some future SQL extension. Moreover, since CREATE TYPE for base types already uses the "( keyword = value, ... )" syntax for these properties, and we have a similar precedent in CREATE/ALTER OPERATOR, it seems to me that the syntax we want here is ALTER TYPE typename SET ( keyword = value [, ... ] ) Attached is a stab at doing it that way, and implementing setting of the binary I/O functions for good measure. (It'd be reasonable to add more stuff, like setting the other support functions, but this is enough for the immediate discussion.) The main thing I'm not too happy about is what to do about domains. Your v2 patch supposed that it's okay to allow ALTER TYPE on domains, but I'm not sure we want to go that way, and if we do there's certainly a bunch more work that has to be done. Up to now the system has supposed that domains inherit all these properties from their base types. I'm not certain exactly how far that assumption has propagated, but there's at least one place that implicitly assumes it: pg_dump has no logic for adjusting a domain to have different storage or support functions than the base type had. So as v2 stands, a custom storage option on a domain would be lost in dump/reload. Another issue that would become a big problem if we allow domains to have custom I/O functions is that the wire protocol transmits the base type's OID, not the domain's OID, for an output column that is of a domain type. A client that expected a particular output format on the strength of what it was told the column type was would be in for a big surprise. Certainly we could fix pg_dump if we had a mind to, but changing the wire protocol for this would have unpleasant ramifications. And I'm worried about whether there are other places in the system that are also making this sort of assumption. I'm also not very convinced that we *want* to allow domains to vary from their base types in this way. The primary use-case I can think of for ALTER TYPE SET is in extension update scripts, and an extension would almost surely wish for any domains over its type to automatically absorb whatever changes of this sort it wants to make. So I think there are two distinct paths we could take here: * Decide that it's okay to allow domains to vary from their base type in these properties. Teach pg_dump to cope with that, and stand ready to fix any other bugs we find, and have some story to tell the people whose clients we break. Possibly add a CASCADE option to ALTER TYPE SET, with the semantics of adjusting dependent domains to match. (This is slightly less scary than the CASCADE semantics you had in mind, because it would only affect pg_type entries not tables.) * Decide that we'll continue to require domains to match their base type in all these properties. That means refusing to allow ALTER on a domain per se, and automatically cascading these changes to dependent domains. In the v3 patch below, I've ripped out the ALTER DOMAIN syntax on the assumption that we'd do the latter; but I've not written the cascade recursion logic, because that seemed like a lot of work to do in advance of having consensus on it being a good idea. I've also restricted the code to work just on base types, because it's far from apparent to me that it makes any sense to allow any of these operations on derived types such as composites or ranges. Again, there's a fair amount of code that is not going to be prepared for such a type to have properties that it could not have at creation, and I don't see a use-case that really justifies breaking those expectations. Thoughts? regards, tom lane diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml index 67be1dd..3465d3f 100644 --- a/doc/src/sgml/ref/alter_type.sgml +++ b/doc/src/sgml/ref/alter_type.sgml @@ -30,6 +30,7 @@ ALTER TYPE name RENAME TO name SET SCHEMA new_schema ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } neighbor_enum_value ] ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value +ALTER TYPE name SET (
[PATCH] Documentation bug related to client authentication using TLS certificate
Hi I found a document bug about client authentication using TLS certificate. When clientcert authentication is enabled in pg_hba.conf, libpq does not verify that the common name in certificate matches database username like it is described in the documentation before allowing client connection. Instead, when sslmode is set to “verify-full”, libpq will verify if the server host name matches the common name in client certificate. When sslmode is set to “verify-ca”, libpq will verify that the client is trustworthy by checking the certificate trust chain up to the root certificate and it does not verify server hostname and certificate common name match in this case. The attached patch corrects the clientcert authentication description in the documentation cheers Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca client_cert_auth.patch Description: Binary data
Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
On 2020-03-02 07:53, Michael Paquier wrote: + * For fixed-size files, the caller may pass the expected size as an + * additional crosscheck on successful recovery. If the file size is not + * known, set expectedSize = 0. + */ +int +RestoreArchivedWALFile(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand) Actually, expectedSize is IMO a bad idea, because any caller of this routine passing down zero could be trapped with an incorrect file size. So let's remove the behavior where it is possible to bypass this sanity check. We don't need it in pg_rewind either. OK, sounds reasonable, but just to be clear. I will remove only a possibility to bypass this sanity check (with 0), but leave expectedSize argument intact. We still need it, since pg_rewind takes WalSegSz from ControlFile and should pass it further, am I right? + /* Remove trailing newline */ + if (strchr(cmd_output, '\n') != NULL) + *strchr(cmd_output, '\n') = '\0'; It seems to me that what you are looking here is to use pg_strip_crlf(). Thinking harder, we have pipe_read_line() in src/common/exec.c which does the exact same job.. pg_strip_crlf fits well, but would you mind if I also make pipe_read_line external in this patch? - /* -* construct the command to be executed -*/ Perhaps you meant "build" here. Actually, the verb 'construct' is used historically applied to archive/restore commands (see also xlogarchive.c and pgarch.c), but it should be 'build' in (fe_)archive.c, since we have BuildRestoreCommand there now. All other remarks look clear for me, so I fix them in the next patch version, thanks. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] Erase the distinctClause if the result is unique by definition
Thank you Tom for the review! On Mon, Mar 2, 2020 at 4:46 AM Tom Lane wrote: > Andy Fan writes: > > Please see if you have any comments. Thanks > > The cfbot isn't at all happy with this. Its linux build is complaining > about a possibly-uninitialized variable, and then giving up: > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/656722993 > > The Windows build isn't using -Werror, but it is crashing in at least > two different spots in the regression tests: > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81778 > > I've not attempted to identify the cause of that. > > Before I submit the patch, I can make sure "make check-world" is successful, but since the compile option is not same, so I didn't catch the possibly-uninitialized variable. As for the crash on the windows, I didn't get the enough information now, I will find a windows server and reproduce the cases. I just found the link http://commitfest.cputube.org/ this morning, I will make sure the next patch can go pass this test. At a high level, I'm a bit disturbed that this focuses only on DISTINCT > and doesn't (appear to) have any equivalent intelligence for GROUP BY, > though surely that offers much the same opportunities for optimization. > It seems like it'd be worthwhile to take a couple steps back and see > if we couldn't recast the logic to work for both. > > OK, Looks group by is a bit harder than distinct since the aggregation function. I will go through the code to see why to add this logic. > Some other random comments: > > * Don't especially like the way you broke query_is_distinct_for() > into two functions, especially when you then introduced a whole > lot of other code in between. This is not expected by me until you point it out. In this case, I have to break the query_is_distinct_for to two functions, but it true that we should put the two functions together. That's just making reviewer's job > harder to see what changed. It makes the comments a bit disjointed > too, that is where you even had any. (Zero introductory comment > for query_is_distinct_agg is *not* up to project coding standards. > There are a lot of other undercommented places in this patch, too.) > > * Definitely don't like having query_distinct_through_join re-open > all the relations. The data needed for this should get absorbed > while plancat.c has the relations open at the beginning. (Non-nullness > of columns, in particular, seems like it'll be useful for other > purposes; I'm a bit surprised the planner isn't using that already.) > I can add new attributes to RelOptInfo and fill the value in get_relation_info call. > * In general, query_distinct_through_join seems hugely messy, expensive, > and not clearly correct. If it is correct, the existing comments sure > aren't enough to explain what it is doing or why. > Removing the relation_open call can make it a bit simpler, I will try more comment to make it clearer in the following patch. > * There seem to be some pointless #include additions, eg in planner.c > the added code doesn't look to justify any of them. Please also > avoid unnecessary whitespace changes, those also slow down reviewing. > > That may because I added the header file some time 1 and then refactored the code later then forget the remove the header file accordingly. Do we need to relay on experience to tell if the header file is needed or not, or do have have any code to tell it automatically? > * I see you decided to add a new regression test file select_distinct_2. > That's a poor choice of name because it conflicts with our rules for the > naming of alternative output files. Besides which, you forgot to plug > it into the test schedule files, so it isn't actually getting run. > Is there a reason not to just add the new test cases to select_distinct? > > Adding it to select_distinct.sql is ok for me as well. Actually I have no obviously reason to add the new file. > * There are some changes in existing regression cases that aren't > visibly related to the stated purpose of the patch, eg it now > notices that "select distinct max(unique2) from tenk1" doesn't > require an explicit DISTINCT step. That's not wrong, but I wonder > if maybe you should subdivide this patch into more than one patch, > because that must be coming from some separate change. I'm also > wondering what caused the plan change in expected/join.out. > Per my purpose it should be in the same patch, the logical here is we have distinct in the sql and the query is distinct already since the max function (the rule is defined in query_is_distinct_agg which is splited from the original query_is_distinct_for clause). > regards, tom lane >
Re: Portal->commandTag as an enum
> On Mar 2, 2020, at 9:08 AM, Alvaro Herrera wrote: > > On 2020-Mar-02, Alvaro Herrera wrote: > >> Here's the patch I propose for commit. I also rewrote the commit >> message. > > BTW I wonder if we should really change the definition of > EventTriggerData. ISTM that it would be sensible to keep it the same > for now ... I think it is more natural to change event trigger code to reason natively about CommandTags rather than continuing to reason about strings. The conversion back-and-forth between the enum and the string representation serves no useful purpose that I can see. But I understand if you are just trying to have the patch change fewer parts of the code, and if you feel more comfortable about reverting that part of the patch, as the committer, I think that's your prerogative. Did you want to do that yourself, or have me do it and resubmit? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Portal->commandTag as an enum
On 2020-Mar-02, Alvaro Herrera wrote: > Here's the patch I propose for commit. I also rewrote the commit > message. BTW I wonder if we should really change the definition of EventTriggerData. ISTM that it would be sensible to keep it the same for now ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Portal->commandTag as an enum
Here's the patch I propose for commit. I also rewrote the commit message. There are further refinements that can be done, but they don't need to be in the first patch. Notably, the event trigger code can surely do a lot better now by translating the tag list to a bitmapset earlier. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 5e5605ec913fb4ea89665debc57d430bbda34b8f Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 2 Mar 2020 13:39:16 -0300 Subject: [PATCH v7] Migrating commandTag from string to enum. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The backend was using strings to represent command tags and doing string comparisons in multiple places, but that's slow and unhelpful. Create a new command list with a supporting structure to use instead; this is stored in a tag-list-file that can be tailored to specific purposes with a caller-defineable C macro, similar to what we do for WAL resource managers. The first first such uses are a new CommandTag enum and a CommandTagBehavior struct. Replace numerous occurrences of char *completionTag with a QueryCompletion struct so that the code no longer stores information about completed queries in a cstring. Only at the last moment, in EndCommand(), does this get converted to a string. EventTriggerCacheItem no longer holds an array of palloc’d tag strings in sorted order, but rather just a Bitmapset over the CommandTags. Author: Mark Dilger Reviewed-by: John Naylor, Tom Lane, Álvaro Herrera Discussion: https://postgr.es/m/981a9db4-3f0c-4da5-88ad-cb9cff4d6...@enterprisedb.com --- .../pg_stat_statements/pg_stat_statements.c | 18 +- contrib/sepgsql/hooks.c | 6 +- doc/src/sgml/event-trigger.sgml | 2 +- src/backend/commands/createas.c | 14 +- src/backend/commands/event_trigger.c | 160 + src/backend/commands/matview.c| 2 +- src/backend/commands/portalcmds.c | 16 +- src/backend/commands/prepare.c| 4 +- src/backend/executor/execMain.c | 4 +- src/backend/executor/functions.c | 4 +- src/backend/executor/spi.c| 22 +- src/backend/replication/logical/decode.c | 2 +- src/backend/replication/walsender.c | 18 +- src/backend/tcop/Makefile | 1 + src/backend/tcop/cmdtag.c | 98 +++ src/backend/tcop/dest.c | 32 +- src/backend/tcop/postgres.c | 24 +- src/backend/tcop/pquery.c | 112 ++-- src/backend/tcop/utility.c| 560 +- src/backend/utils/cache/evtcache.c| 30 +- src/backend/utils/cache/plancache.c | 4 +- src/backend/utils/mmgr/portalmem.c| 6 +- src/include/commands/createas.h | 3 +- src/include/commands/event_trigger.h | 3 +- src/include/commands/matview.h| 2 +- src/include/commands/portalcmds.h | 2 +- src/include/commands/prepare.h| 2 +- src/include/tcop/cmdtag.h | 56 ++ src/include/tcop/cmdtaglist.h | 218 +++ src/include/tcop/dest.h | 6 +- src/include/tcop/pquery.h | 2 +- src/include/tcop/utility.h| 15 +- src/include/utils/evtcache.h | 3 +- src/include/utils/plancache.h | 7 +- src/include/utils/portal.h| 6 +- src/pl/plperl/plperl.c| 2 +- src/pl/plpgsql/src/pl_exec.c | 10 +- src/pl/tcl/pltcl.c| 3 +- .../test_ddl_deparse/test_ddl_deparse.c | 2 +- 39 files changed, 873 insertions(+), 608 deletions(-) create mode 100644 src/backend/tcop/cmdtag.c create mode 100644 src/include/tcop/cmdtag.h create mode 100644 src/include/tcop/cmdtaglist.h diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index e4fda4b404..7b8e690c95 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -307,7 +307,7 @@ static void pgss_ExecutorEnd(QueryDesc *queryDesc); static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, -DestReceiver *dest, char *completionTag); +DestReceiver *dest, QueryCompletion *qc); static uint64 pgss_hash_string(const char *str, int len); static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, @@ -960,7 +960,7 @@ static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
Re: Portal->commandTag as an enum
On 2020-Mar-02, Mark Dilger wrote: > > I don't > > understand your point of pg_stats_sql having to deal with this in a > > particular way. How is that patch obtaining the command tags? I would > > hope it calls GetCommandTagName() rather than call CommandEnd, but maybe > > I misunderstand where it hooks. > > My objection was based on my misunderstanding of what Tom was requesting. > > I can rework the patch the way Tom suggests. I already did it :-) Posting in a jiffy -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Portal->commandTag as an enum
> On Mar 2, 2020, at 8:12 AM, Alvaro Herrera wrote: > > On 2020-Feb-29, Mark Dilger wrote: > >> You may want to think about the embedding of InvalidOid into the EndCommand >> output differently from how you think about the embedding of the rowcount >> into the EndCommand output, but my preference is to treat these issues the >> same and make a strong distinction between the commandtag and the embedded >> oid and/or rowcount. It's hard to say how many future features would be >> crippled by having the embedded InvalidOid in the commandtag, but as an >> example *right now* in the works, we have a feature to count how many >> commands of a given type have been executed. It stands to reason that >> feature, whether accepted in its current form or refactored, would not want >> to show users a pg_stats table like this: >> >> cnt command >> - >> 5INSERT 0 >> 37 SELECT >> >> What the heck is the zero doing after the INSERT? That's the hardcoded >> InvalidOid that you are apparently arguing for. You could get around that >> by having the pg_sql_stats patch have its own separate set of command tag >> strings, but why would we intentionally design that sort of duplication into >> the solution? > > This is what I think Tom means to use in EndCommand: > >if (command_tag_display_rowcount(tag) && !force_undecorated_output) >snprintf(completionTag, COMPLETION_TAG_BUFSIZE, > tag == CMDTAG_INSERT ? > "%s 0 " UINT64_FORMAT : "%s " UINT64_FORMAT, > tagname, qc->nprocessed); >else >... no rowcount ... > > The point is not to change the returned tag in any way -- just to make > the method to arrive at it not involve the additional data column in the > data file, instead hardcode the behavior in EndCommand. Thanks, Álvaro, I think I get it now. I thought Tom was arguing to have "INSERT 0" rather than "INSERT" be the commandtag. > I don't > understand your point of pg_stats_sql having to deal with this in a > particular way. How is that patch obtaining the command tags? I would > hope it calls GetCommandTagName() rather than call CommandEnd, but maybe > I misunderstand where it hooks. My objection was based on my misunderstanding of what Tom was requesting. I can rework the patch the way Tom suggests. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Portal->commandTag as an enum
On 2020-Feb-29, Mark Dilger wrote: > You may want to think about the embedding of InvalidOid into the EndCommand > output differently from how you think about the embedding of the rowcount > into the EndCommand output, but my preference is to treat these issues the > same and make a strong distinction between the commandtag and the embedded > oid and/or rowcount. It's hard to say how many future features would be > crippled by having the embedded InvalidOid in the commandtag, but as an > example *right now* in the works, we have a feature to count how many > commands of a given type have been executed. It stands to reason that > feature, whether accepted in its current form or refactored, would not want > to show users a pg_stats table like this: > >cnt command > - >5INSERT 0 >37 SELECT > > What the heck is the zero doing after the INSERT? That's the hardcoded > InvalidOid that you are apparently arguing for. You could get around that by > having the pg_sql_stats patch have its own separate set of command tag > strings, but why would we intentionally design that sort of duplication into > the solution? This is what I think Tom means to use in EndCommand: if (command_tag_display_rowcount(tag) && !force_undecorated_output) snprintf(completionTag, COMPLETION_TAG_BUFSIZE, tag == CMDTAG_INSERT ? "%s 0 " UINT64_FORMAT : "%s " UINT64_FORMAT, tagname, qc->nprocessed); else ... no rowcount ... The point is not to change the returned tag in any way -- just to make the method to arrive at it not involve the additional data column in the data file, instead hardcode the behavior in EndCommand. I don't understand your point of pg_stats_sql having to deal with this in a particular way. How is that patch obtaining the command tags? I would hope it calls GetCommandTagName() rather than call CommandEnd, but maybe I misunderstand where it hooks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Replica sends an incorrect epoch in its hot standby feedback to the Master
On Fri, Feb 21, 2020 at 8:15 PM Palamadai, Eka wrote: Please, do not top post on this list. Thanks a lot for the feedback. Please let me know if you have any further > comments. Meanwhile, I have also added this patch to "Commitfest 2020-03" > at https://commitfest.postgresql.org/27/2464. > Apparently, there are a couple of duplicate entries in the commitfest as: https://commitfest.postgresql.org/27/2463/ and https://commitfest.postgresql.org/27/2462/ Could close those as "withdrawn"? Regards, Juan José Santamaría Flecha
Re: Auxiliary Processes and MyAuxProc
On Sat, Feb 29, 2020 at 9:51 PM Mike Palmiotto wrote: > > On Sat, Nov 30, 2019 at 9:15 PM Michael Paquier wrote: > > > > On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote: > > > Color me unconvinced. > > > > The latest comments of the thread have not been addressed yet. so I am > > marking the patch as returned with feedback. If you think that's not > > correct, please feel free to update the status of the patch. If you > > do so, please provide at the same time a rebased version of the patch, > > as it is failing to apply on HEAD. > > I've finally had some time to update the patchset with an attempt at > addressing Andres' concerns. Note that the current spin has a bug > which does not allow syslogger to run properly. Additionally, it is > squashed into one patch again. I intend to split the patch out into > separate functional patches and fix the syslogger issue, but wanted to > get this on the ticket for the open CommitFest before it closes. I'll > go ahead and re-add it and will be pushing updates as I have them. Okay, here is an updated and rebased patch that passes all regression tests with and without EXEC_BACKEND. This also treats more of the issues raised by Andres. I still need to do the following: - split giant patch out into separate functional commits - add translated process descriptions - fix some variable names here and there (notably the PgSubprocess struct ".progname" member -- that's a misnomer. - address some valgrind findings I should be able to split this out into smaller commits sometime today and will continue iterating to scratch the other items off the list. Thanks, -- Mike Palmiotto https://crunchydata.com From a497a8905fa7f25d41f831e83174548315fa658f Mon Sep 17 00:00:00 2001 From: Mike Palmiotto Date: Tue, 19 Feb 2019 15:29:33 + Subject: [PATCH 2/2] Add a hook to allow extensions to set worker metadata This patch implements a hook that allows extensions to modify a worker process' metadata in backends. Additional work done by Yuli Khodorkovskiy Original discussion here: https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO%3DhtC6LnA6aW4r-%2Bjq%3D3Q5RAoFQgW8EtA%40mail.gmail.com --- src/backend/postmaster/fork_process.c | 11 +++ src/include/postmaster/fork_process.h | 4 2 files changed, 15 insertions(+) diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c index def3cee37e..e392f5207e 100644 --- a/src/backend/postmaster/fork_process.c +++ b/src/backend/postmaster/fork_process.c @@ -22,6 +22,14 @@ #include "postmaster/fork_process.h" +/* + * This hook allows plugins to get control of the child process following + * a successful fork_process(). It can be used to perform some action in + * extensions to set metadata in backends (including special backends) upon + * setting of the session user. + */ +ForkProcess_post_hook_type ForkProcess_post_hook = NULL; + #ifndef WIN32 /* * Wrapper for fork(). Return values are the same as those for fork(): @@ -114,6 +122,9 @@ fork_process(void) #ifdef USE_OPENSSL RAND_cleanup(); #endif + + if (ForkProcess_post_hook) + (*ForkProcess_post_hook) (); } return result; diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h index a42f859a08..b4d1560b72 100644 --- a/src/include/postmaster/fork_process.h +++ b/src/include/postmaster/fork_process.h @@ -20,4 +20,8 @@ extern pid_t fork_process(void); extern pid_t postmaster_forkexec(int argc, char *argvp[]); #endif +/* Hook for plugins to get control after a successful fork_process() */ +typedef void (*ForkProcess_post_hook_type) (); +extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook; + #endif /* FORK_PROCESS_H */ -- 2.21.0 From 682aaa548249be06a6693839f1877946de8ca927 Mon Sep 17 00:00:00 2001 From: Mike Palmiotto Date: Fri, 27 Sep 2019 12:28:19 -0400 Subject: [PATCH 1/2] Introduce subprocess infrastructure This is an initial attempt at coalescing all of the postmaster subprocess startups into one central framework. The patchset introduces a MySubprocess global, which contains an PgSubprocess entry from the process_types array. Each entry has defined entrypoints, cleanup functions, prep functions, and a few other control-oriented variables. This is a rework of https://commitfest.postgresql.org/25/2259/, which attempts to address the feedback from Andres Freund. --- src/backend/bootstrap/bootstrap.c | 103 +-- src/backend/main/main.c | 1 + src/backend/postmaster/Makefile | 1 + src/backend/postmaster/autovacuum.c | 170 +--- src/backend/postmaster/bgworker.c | 157 +++- src/backend/postmaster/bgwriter.c | 2 +- src/backend/postmaster/checkpointer.c | 2 +- src/backend/postmaster/pgarch.c | 83 +- src/backend/postmaster/pgstat.c | 195 + src/backend/postmaster/postmaster.c | 890 +++-
Re: [Proposal] Global temporary tables
On 2/27/20 9:43 AM, 曾文旌(义从) wrote: _-- Scenario 2:_ Here I am getting the same error message in both the below cases. We may add a "global" keyword with GTT related error message. postgres=# create global temporary table gtt1 (c1 int unique); CREATE TABLE postgres=# create temporary table tmp1 (c1 int unique); CREATE TABLE postgres=# create temporary table tmp2 (c1 int references gtt1(c1) ); ERROR: constraints on temporary tables may reference only temporary tables postgres=# create global temporary table gtt2 (c1 int references tmp1(c1) ); ERROR: constraints on temporary tables may reference only temporary tables Fixed in global_temporary_table_v15-pg13.patch Thanks Wenjing. This below scenario is not working i.e even 'on_commit_delete_rows' is true then after commit - rows are NOT removing postgres=# create global temp table foo1(n int) with (on_commit_delete_rows='true'); CREATE TABLE postgres=# postgres=# begin; BEGIN postgres=*# insert into foo1 values (9); INSERT 0 1 postgres=*# insert into foo1 values (9); INSERT 0 1 postgres=*# select * from foo1; n --- 9 9 (2 rows) postgres=*# commit; COMMIT postgres=# select * from foo1; -- after commit -there should be 0 row as on_commit_delete_rows is 'true' n --- 9 9 (2 rows) postgres=# \d+ foo1 Table "public.foo1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- n | integer | | | | plain | | Access method: heap Options: on_commit_delete_rows=true postgres=# but if user - create table this way then it is working as expected postgres=# create global temp table foo2(n int) *on commit delete rows;* CREATE TABLE postgres=# begin; insert into foo2 values (9); insert into foo2 values (9); commit; select * from foo2; BEGIN INSERT 0 1 INSERT 0 1 COMMIT n --- (0 rows) postgres=# i guess , problem is something with this syntax - create global temp table foo1(n int) *with (on_commit_delete_rows='true'); * -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: Fastpath while arranging the changes in LSN order in logical decoding
Hi Dilip, On 2/18/20 7:30 PM, David Zhang wrote: After manually applied the patch, a diff regenerated is attached. David's updated patch applies but all logical decoding regression tests are failing on cfbot. Do you know when you will be able to supply an updated patch? Regards, -- -David da...@pgmasters.net
Re: Berserk Autovacuum (let's save next Mandrill)
On Tue, 2020-01-07 at 19:05 +0100, Tomas Vondra wrote: > This patch is currently in "needs review" state, but that seems quite > wrong - there's been a lot of discussions about how we might improve > behavior for append-only-tables, but IMO there's no clear consensus nor > a patch that we might review. > > So I think this should be either "waiting on author" or maybe "rejected > with feedback". Is there any chance of getting a reviewable patch in the > current commitfest? If not, I propose to mark it as RWF. > > I still hope we can improve this somehow in time for PG13. The policy is > not to allow new non-trivial patches in the last CF, but hopefully this > might be considered an old patch. I think that no conclusion was reached because there are *many* things that could be improved, and *many* interesting and ambitious ideas were vented. But I think it would be good to have *something* that addresses the immediate problem (INSERT-only tables are autovacuumed too late), as long as that does not have negative side-effects or blocks further improvements. I don't feel totally well with the very simplistic approach of this patch (use the same metric to trigger autoanalyze and autovacuum), but what about this: - a new table storage option autovacuum_vacuum_insert_threshold, perhaps a GUC of the same name, by default deactivated. - if tabentry->tuples_inserted exceeds this threshold, but not one of the others, lauch autovacuum with index_cleanup=off. How would you feel about that? Yours, Laurenz Albe
Re: doc: vacuum full, fillfactor, and "extra space"
On 1/30/20 6:54 AM, Amit Kapila wrote: On Wed, Jan 29, 2020 at 9:10 PM Peter Eisentraut wrote: On 2020-01-20 06:30, Justin Pryzby wrote: Rebased against 40d964ec997f64227bc0ff5e058dc4a5770a70a9 I'm not sure that description of parallel vacuum in the middle of non-full vs. full vacuum is actually that good. I have done like that because parallel vacuum is the default. I mean when the user runs vacuum command, it will invoke workers to perform index cleanup based on some conditions. I think those sentences should be moved to a separate paragraph. It seems more natural to me to add immediately after vacuum explanation, but I might be wrong. After the above explanation, if you still think it is better to move into a separate paragraph, I can do that. Peter, do you still think this should be moved into a separate paragraph? Regards, -- -David da...@pgmasters.net
Re: Psql patch to show access methods info
Hi Alexander, On 1/21/20 5:37 PM, Alvaro Herrera wrote: On 2020-Jan-21, Alvaro Herrera wrote: c) it would be damn handy if \dAf (maybe \dAf+) lists the datatypes that each opfamily has opclasses for. Maybe make the output an array, like {int4,int8,numeric,...} Something like [*] but somehow make it prettier? Sorry, I forgot to copy-edit my text here: I said "make it prettier", but the query I submitted is already pretty enough ISTM; I had written that comment when I only had the array_agg() version, but then I changed it to string_agg() and that seems to have mostly done the trick. Maybe improve the format_type() bit to omit the quotes, if possible, but that doesn't seem a big deal. The last CF for PG13 has now started. Do you know when you'll be able to supply a new patch to address Álvaro's review? Regards, -- -David da...@pgmasters.net
Silence compiler warnings with Python 3.9
Starting with Python 3.9, the Python headers contain inline functions that fall afoul of our -Wdeclaration-after-statement coding style. In order to silence those warnings, I've added some GCC-specific contortions to disable that warning for Python.h only. Clang doesn't appear to warn about this at all; maybe it recognizes that this is an external header file. We could also write a configure check for this if we want to be more flexible. (Attempts to convince upstream to change the coding style were unsuccessful (https://bugs.python.org/issue39615).) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From e63a5935b7059d70418b63bd4e4b7ccb487e2075 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 2 Mar 2020 09:11:05 +0100 Subject: [PATCH] Silence compiler warnings with Python 3.9 Starting with Python 3.9, the Python headers contain inline functions that fall afoul of our -Wdeclaration-after-statement coding style, so add some contortions to disable that warning for Python.h only. Attempts to convince upstream to change the coding style were unsuccessful (https://bugs.python.org/issue39615). --- src/pl/plpython/plpython.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/pl/plpython/plpython.h b/src/pl/plpython/plpython.h index 6d981a0a06..7f2256e9e2 100644 --- a/src/pl/plpython/plpython.h +++ b/src/pl/plpython/plpython.h @@ -42,6 +42,21 @@ #undef vprintf #undef printf +/* + * Starting with Python 3.9, the Python headers contain inline functions that + * fall afoul of our -Wdeclaration-after-statement coding style, so we need to + * do a little dance here to disable that warning momentarily. + */ + +#ifdef __GNUC__ +#define GCC_VERSION (__GNUC__ * 1 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) +#endif + +#if defined(__GNUC__) && GCC_VERSION >= 40600 +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wdeclaration-after-statement" +#endif + #if defined(_MSC_VER) && defined(_DEBUG) /* Python uses #pragma to bring in a non-default libpython on VC++ if * _DEBUG is defined */ @@ -59,6 +74,10 @@ #include #endif +#if defined(__GNUC__) && GCC_VERSION >= 40600 +#pragma GCC diagnostic pop +#endif + /* * Python 2/3 strings/unicode/bytes handling. Python 2 has strings * and unicode, Python 3 has strings, which are unicode on the C -- 2.25.0
Re: Minor issues in .pgpass
On 2020/02/29 0:46, Hamid Akhtar wrote: The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested First of all, this seems like fixing a valid issue, albeit, the probability of somebody messing is low, but it is still better to fix this problem. I've not tested the patch in any detail, however, there are a couple of comments I have before I proceed on with detailed testing. Thanks for the review and comments! 1. pgindent is showing a few issues with formatting. Please have a look and resolve those. Yes. 2. I think you can potentially use "len" variable instead of introducing "buflen" and "tmplen" variables. Basically I don't want to use the same variable for several purposes because which would decrease the code readability. Also, I would choose a more appropriate name for "tmp" variable. Yeah, so what about "rest" as the variable name? I believe if you move the following lines before the conditional statement and simply and change the if statement to "if (len >= sizeof(buf) - 1)", it will serve the purpose. ISTM that this doesn't work correctly when the "buf" contains trailing carriage returns but not newlines (i.e., this line is too long so the "buf" doesn't include newline). In this case, pg_strip_crlf() shorten the "buf" and then its return value "len" should become less than sizeof(buf). So the following condition always becomes false unexpectedly in that case even though there is still rest of the line to eat. + if (len >= sizeof(buf) - 1) + { + chartmp[LINELEN]; Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: Do we need to handle orphaned prepared transactions in the server?
Here is the v2 of the same patch after rebasing it and running it through pgindent. There are no other code changes. On Wed, Feb 19, 2020 at 8:04 PM Hamid Akhtar wrote: > All, > > Attached is version 1 of POC patch for notifying of orphaned > prepared transactions via warnings emitted to a client > application and/or log file. It applies to PostgreSQL branch > "master" on top of "e2e02191" commit. > > I've tried to keep the patch as less invasive as I could with > minimal impact on vacuum processes, so the performance impact > and the changes are minimal in that area of PostgreSQL core. > > > - What's in this Patch: > > This patch throws warnings when an autovacuum worker encounters > an orphaned prepared transaction. It also throws warnings to a > client when a vacuum command is issued. This patch also > introduces two new GUCs: > > (1) max_age_prepared_xacts > - The age after creation of a prepared transaction after which it > will be considered an orphan. > > (2) prepared_xacts_vacuum_warn_timeout > - The timeout period for an autovacuum (essentially any of its > worker) to check for orphaned prepared transactions and throw > warnings if any are found. > > > - What This Patch Does: > > If the GUCs are enabled (set to a value higher than -1), an > autovacuum worker running in the background checks if the > timeout has expired. If so, it checks if there are any orphaned > prepared transactions (i.e. their age has exceeded > max_age_prepared_xacts). If it finds any, it throws a warning for > every such transaction. It also emits the total number of orphaned > prepared transactions if one or more are found. > > When a vacuum command is issued from within a client, say psql, > in that case, we skip the vacuum timeout check and simply scan > for any orphaned prepared transactions. Warnings are emitted to > the client and log file if any are found. > > > - About the New GUCs: > > = max_age_prepared_xacts: > Sets maximum age after which a prepared transaction is considered an > orphan. It applies when "prepared transactions" are enabled. The > age for a transaction is calculated from the time it was created to > the current time. If this value is specified without units, it is taken > as milliseconds. The default value is -1 which allows prepared > transactions to live forever. > > = prepared_xacts_vacuum_warn_timeout: > Sets timeout after which vacuum starts throwing warnings for every > prepared transactions that has exceeded maximum age defined by > "max_age_prepared_xacts". If this value is specified without units, > it is taken as milliseconds. The default value of -1 will disable > this warning mechanism. Setting a too value could potentially fill > up log with orphaned prepared transaction warnings, so this > parameter must be set to a value that is reasonably large to not > fill up log file, but small enough to notify of long running and > potential orphaned prepared transactions. There is no additional > timer or worker introduced with this change. Whenever a vacuum > worker runs, it first checks for any orphaned prepared transactions. > So at best, this GUC serves as a guideline for a vacuum worker > if a warning should be thrown to log file or a client issuing > vacuum command. > > > - What this Patch Does Not Cover: > > The warning is not thrown when user either runs vacuumdb or passes > individual relations to be vacuum. Specifically in case of vacuumdb, > it breaks down a vacuum command to an attribute-wise vacuum command. > So the vacuum command is indirectly run many times. Considering that > we want to emit warnings for every manual vacuum command, this simply > floods the terminal and log with orphaned prepared transactions > warnings. We could potentially handle that, but the overhead of > that seemed too much to me (and I've not invested any time looking > to fix that either). Hence, warnings are not thrown when user runs > vacuumdb and relation specific vacuum. > > > > On Fri, Jan 31, 2020 at 7:02 PM Hamid Akhtar > wrote: > >> >> >> On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer >> wrote: >> >>> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar >>> wrote: >>> > >>> > So having seen the feedback on this thread, and I tend to agree with >>> most of what has been said here, I also agree that the server core isn't >>> really the ideal place to handle the orphan prepared transactions. >>> > >>> > Ideally, these must be handled by a transaction manager, however, I do >>> believe that we cannot let database suffer for failing of an external >>> software, and we did a similar change through introduction of idle in >>> transaction timeout behavior. >>> >>> The difference, IMO, is that idle-in-transaction aborts don't affect >>> anything we've promised to be durable. >>> >>> Once you PREPARE TRANSACTION the DB has made a promise that that txn >>> is durable. We don't have any consistent feedback channel to back to >>> applications and say "Hey, if you're not going to finish this up we >>> need to get
Re: Change atoi to strtol in same place
> On 11 Feb 2020, at 17:54, Alvaro Herrera wrote: > > On 2019-Dec-06, Joe Nelson wrote: > >> I see this patch has been moved to the next commitfest. What's the next >> step, does it need another review? > > This patch doesn't currently apply; it has conflicts with at least > 01368e5d9da7 and 7e735035f208; even in 7e735035f208^ it applies with > fuzz. Please post an updated version so that it can move forward. Ping. With the 2020-03 CommitFest now under way, are you able to supply a rebased patch for consideration? cheers ./daniel
Re: [BUG?] postgres_fdw incorrectly updates remote table if it has inherited children.
Fujita-san, > Unfortunately, I didn't have time to work on that (and won't in the > development cycle for PG13.) > > > Indeed, it is not an ideal query plan to execute for each updated rows... > > > > postgres=# explain select * from rtable_parent where tableoid = 126397 > > and ctid = '(0,11)'::tid; > >QUERY PLAN > > - > > Append (cost=0.00..5.18 rows=2 width=50) > >-> Seq Scan on rtable_parent (cost=0.00..1.15 rows=1 width=31) > > Filter: ((tableoid = '126397'::oid) AND (ctid = '(0,11)'::tid)) > >-> Tid Scan on rtable_child (cost=0.00..4.02 rows=1 width=68) > > TID Cond: (ctid = '(0,11)'::tid) > > Filter: (tableoid = '126397'::oid) > > (6 rows) > > IIRC, I think one of Tom's concerns about the solution I proposed was > that it added the tableoid restriction clause to the remote > UPDATE/DELETE query even if the remote table is not an inheritance > set. To add the clause only if the remote table is an inheritance > set, what I have in mind is to 1) introduce a new postgres_fdw table > option to indicate whether the remote table is an inheritance set or > not, and 2) determine whether to add the clause or not, using the > option. > I don't think the new options in postgres_fdw is a good solution because remote table structure is flexible regardless of the local configuration in foreign-table options. People may add inherited child tables after the declaration of foreign-tables. It can make configuration mismatch. Even if we always add tableoid=OID restriction on the remote query, it shall be evaluated after the TidScan fetched the row pointed by ctid. Its additional cost is limited. And, one potential benefit is tableoid=OID restriction can be used to prune unrelated partition leafs/inherited children at the planner stage. Probably, it is a separated groundwork from postgres_fdw. One planner considers the built-in rule for this kind of optimization, enhancement at postgres_fdw will be quite simple, I guess. How about your thought? Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: Planning counters in pg_stat_statements (using pgss_store)
On Mon, Mar 2, 2020 at 1:01 PM legrand legrand wrote: > > Julien Rouhaud wrote > > On Sun, Mar 1, 2020 at 3:55 PM legrand legrand > > > > > legrand_legrand@ > > > wrote: > >> > >> >> I like the idea of adding a check for a non-zero queryId in the new > >> >> pgss_planner_hook() (zero queryid shouldn't be reserved for > >> >> utility_statements ?). > >> > >> > Some assert hit later, I can say that it's not always true. For > >> > instance a CREATE TABLE AS won't run parse analysis for the underlying > >> > query, as this has already been done for the original statement, but > >> > will still call the planner. I'll change pgss_planner_hook to ignore > >> > such cases, as pgss_store would otherwise think that it's a utility > >> > statement. That'll probably incidentally fix the IVM incompatibility. > >> > >> Today with or without test on parse->queryId != UINT64CONST(0), > >> CTAS is collected as a utility_statement without planning counter. > >> This seems to me respectig the rule, not sure that this needs any > >> new (risky) change to the actual (quite stable) patch. > > > > But the queryid ends up not being computed the same way: > > > > # select queryid, query, plans, calls from pg_stat_statements where > > query like 'create table%'; > >queryid | query | plans | calls > > -++---+--- > > 8275950546884151007 | create table test as select 1; | 1 | 0 > > 7546197440584636081 | create table test as select 1 | 0 | 1 > > (2 rows) > > > > That's because CreateTableAsStmt->query doesn't have a query > > location/len, as transformTopLevelStmt is only setting that for the > > top level Query. That's probably an oversight in ab1f0c82257, but I'm > > not sure what's the best way to fix that. Should we pass that > > information to all transformXXX function, or let transformTopLevelStmt > > handle that. > > > arf, this was not the case in my testing env (that is not up to date) :o( > and would not have appeared at all with the proposed test on > parse->queryId != UINT64CONST(0) ... I'm not sure what was the exact behavior you had, but that shouldn't have changed since previous version. The underlying query isn't a top level statement, so maybe you didn't set pg_stat_statements.track = 'all'?
Re: Planning counters in pg_stat_statements (using pgss_store)
Julien Rouhaud wrote > On Sun, Mar 1, 2020 at 3:55 PM legrand legrand > > legrand_legrand@ > wrote: >> >> >> I like the idea of adding a check for a non-zero queryId in the new >> >> pgss_planner_hook() (zero queryid shouldn't be reserved for >> >> utility_statements ?). >> >> > Some assert hit later, I can say that it's not always true. For >> > instance a CREATE TABLE AS won't run parse analysis for the underlying >> > query, as this has already been done for the original statement, but >> > will still call the planner. I'll change pgss_planner_hook to ignore >> > such cases, as pgss_store would otherwise think that it's a utility >> > statement. That'll probably incidentally fix the IVM incompatibility. >> >> Today with or without test on parse->queryId != UINT64CONST(0), >> CTAS is collected as a utility_statement without planning counter. >> This seems to me respectig the rule, not sure that this needs any >> new (risky) change to the actual (quite stable) patch. > > But the queryid ends up not being computed the same way: > > # select queryid, query, plans, calls from pg_stat_statements where > query like 'create table%'; >queryid | query | plans | calls > -++---+--- > 8275950546884151007 | create table test as select 1; | 1 | 0 > 7546197440584636081 | create table test as select 1 | 0 | 1 > (2 rows) > > That's because CreateTableAsStmt->query doesn't have a query > location/len, as transformTopLevelStmt is only setting that for the > top level Query. That's probably an oversight in ab1f0c82257, but I'm > not sure what's the best way to fix that. Should we pass that > information to all transformXXX function, or let transformTopLevelStmt > handle that. arf, this was not the case in my testing env (that is not up to date) :o( and would not have appeared at all with the proposed test on parse->queryId != UINT64CONST(0) ... -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: color by default
The patch to improve color support on Windows has been commited [1], and I would like to share some of the discussion there that might affect this patch. - The documentation/comments could make a better job of explaining the case of PG_COLOR equals 'always', explicitly saying that no checks are done about the output channel. Aside from the decision about what the default coloring behaviour should be, there are parts of this patch that could be applied independently, as an improvement on the current state. - The new function terminal_supports_color() should also apply when PG_COLOR is 'auto', to minimize the chances of seeing escape characters in the user terminal. - The new entry in the documentation, specially as the PG_COLORS parameter seems to be currently undocumented. The programs that can use PG_COLOR would benefit from getting a link to it. [1] https://www.postgresql.org/message-id/20200302064842.GE32059%40paquier.xyz Regards, Juan José Santamaría Flecha
Re: Crash by targetted recovery
On 2020/02/28 12:13, Kyotaro Horiguchi wrote: At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao wrote in On 2020/02/27 17:05, Kyotaro Horiguchi wrote: Thank you for the comment. At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao wrote in On 2020/02/27 15:23, Kyotaro Horiguchi wrote: I failed to understand why random access while reading from stream is bad idea. Could you elaborate why? It seems to me the word "streaming" suggests that WAL record should be read sequentially. Random access, which means reading from arbitrary location, breaks a stream. (But the patch doesn't try to stop wal sender if randAccess.) Isn't it sufficient to set currentSource to 0 when disabling StandbyMode? I thought that and it should work, but I hesitated to manipulate on currentSource in StartupXLOG. currentSource is basically a private state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I think it's not good to modify it out of the the logic in WaitForWALToBecomeAvailable. If so, what about adding the following at the top of WaitForWALToBecomeAvailable()? if (!StandbyMode && currentSource == XLOG_FROM_STREAM) currentSource = 0; It works virtually the same way. I'm happy to do that if you don't agree to using randAccess. But I'd rather do that in the 'if (!InArchiveRecovery)' section. The approach using randAccess seems unsafe. Please imagine the case where currentSource is changed to XLOG_FROM_ARCHIVE because randAccess is true, while walreceiver is still running. For example, this case can occur when the record at REDO starting point is fetched with randAccess = true after walreceiver is invoked to fetch the last checkpoint record. The situation "currentSource != XLOG_FROM_STREAM while walreceiver is running" seems invalid. No? When I mentioned an possibility of changing ReadRecord so that it modifies randAccess instead of currentSource, I thought that WaitForWALToBecomeAvailable should shutdown wal receiver as needed. At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi wrote in me> location, breaks a stream. (But the patch doesn't try to stop wal me> sender if randAccess.) Sorry, I failed to notice that. And random access during StandbyMode ususally (always?) lets RecPtr go back. I'm not sure WaitForWALToBecomeAvailable works correctly if we don't have a file in pg_wal and the REDO point is far back by more than a segment from the initial checkpoint record. It works correctly. This is why WaitForWALToBecomeAvailable() uses fetching_ckpt argument. If we go back to XLOG_FROM_ARCHIVE by random access, it correctly re-connects to the primary for the past segment. But this can lead to unnecessary restart of walreceiver. Since fetching_ckpt ensures that the WAL file containing the REDO starting record exists in pg_wal, there is no need to reconnect to the primary when reading the REDO starting record. Is there other case where we need to go back to XLOG_FROM_ARCHIVE by random access? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
Re: logical replication empty transactions
On Mon, Mar 2, 2020 at 9:01 AM Dilip Kumar wrote: > > On Sat, Nov 9, 2019 at 7:29 AM Euler Taveira wrote: > > > > Em seg., 21 de out. de 2019 às 21:20, Jeff Janes > > escreveu: > > > > > > After setting up logical replication of a slowly changing table using the > > > built in pub/sub facility, I noticed way more network traffic than made > > > sense. Looking into I see that every transaction in that database on the > > > master gets sent to the replica. 99.999+% of them are empty transactions > > > ('B' message and 'C' message with nothing in between) because the > > > transactions don't touch any tables in the publication, only > > > non-replicated tables. Is doing it this way necessary for some reason? > > > Couldn't we hold the transmission of 'B' until something else comes > > > along, and then if that next thing is 'C' drop both of them? > > > > > That is not optimal. Those empty transactions is a waste of bandwidth. > > We can suppress them if no changes will be sent. test_decoding > > implements "skip empty transaction" as you described above and I did > > something similar to it. Patch is attached. > > I think this significantly reduces the network bandwidth for empty > transactions. I have briefly reviewed the patch and it looks good to > me. > One thing that is not clear to me is how will we advance restart_lsn if we don't send any empty xact in a system where there are many such xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn sent by subscriber. After this change, the subscriber won't be able to send the confirmed_flush and for a long time, we won't be able to advance restart_lsn. Is that correct, if so, why do we think that is acceptable? One might argue that restart_lsn will be advanced as soon as we send the first non-empty xact, but not sure if that is good enough. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Add schema and table names to partition error
On Mon, Mar 2, 2020 at 9:39 AM Amit Langote wrote: > > > > My (very limited) understanding is that partition > > "constraints" are entirely contained within pg_class.relpartbound of the > > partition. > > That is correct. > > > Are you suggesting that the table name go into the constraint name field > > of the error? > > Yes, that's what I was thinking, at least for "partition constraint > violated" errors, but given the above that would be a misleading use > of ErrorData.constraint_name. > I think it is better to use errtable in such cases. > Maybe it's not too late to invent a new error code like > ERRCODE_PARTITION_VIOLATION or such, then maybe we can use a > hard-coded name, say, just the string "partition constraint". > > > >> There are couple more instances in src/backend/command/tablecmds.c > > >> where partition constraint is checked: > > >> > > >> Maybe, better fix these too for completeness. > > > > Done. As there is no named constraint here, I used errtable again. > > > > > Right, if we want to make a change for this, then I think we can once > > > check all the places where we use error code ERRCODE_CHECK_VIOLATION. > > > > I've looked at every instance of this. It is used for 1) check > > constraints, 2) domain constraints, and 3) partition constraints. > > > > 1. errtableconstraint is used with the name of the constraint. > > 2. errdomainconstraint is used with the name of the constraint except in > > one instance which deliberately uses errtablecol. > > 3. With the attached patch, errtable is used except for one instance in > > src/backend/partitioning/partbounds.c described below. > > > > In check_default_partition_contents of > > src/backend/partitioning/partbounds.c, the default partition is checked > > for any rows that should belong in the partition being added _unless_ > > the leaf being checked is a foreign table. There are two tables > > mentioned in this warning, and I couldn't decide which, if any, deserves > > to be in the error fields: > > > > /* > > * Only RELKIND_RELATION relations (i.e. leaf > > partitions) need to be > > * scanned. > > */ > > if (part_rel->rd_rel->relkind != RELKIND_RELATION) > > { > > if (part_rel->rd_rel->relkind == > > RELKIND_FOREIGN_TABLE) > > ereport(WARNING, > > > > (errcode(ERRCODE_CHECK_VIOLATION), > > errmsg("skipped > > scanning foreign table \"%s\" which is a partition of default partition > > \"%s\"", > > > > RelationGetRelationName(part_rel), > > > > RelationGetRelationName(default_rel; > > It seems strange to see that errcode here or any errcode for that > matter, so we shouldn't really be concerned about this one. > Right. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Hello I reviewed a recently published patch. Looks good for me. One small note: the values for the new definitions in progress.h seems not to be aligned vertically. However, pgindent doesn't objects. regards, Sergei
Re: Cleanup - Removal of apply_typmod function in #if 0
On Mon, Mar 2, 2020 at 1:27 PM Peter Eisentraut wrote: > > On 2019-10-26 10:36, vignesh C wrote: > > One of the function apply_typmod in numeric.c file present within #if 0. > > This is like this for many years. > > I felt this can be removed. > > Attached patch contains the changes to handle removal of apply_typmod > > present in #if 0. > > committed > Thanks Peter for committing. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Allow to_date() and to_timestamp() to accept localized names
On Sun, Mar 1, 2020 at 11:25 PM Tom Lane wrote: > > > Barring objections, this seems > > committable to me. > > Going once, going twice ... > You have moved this to better place, so none from me, and thank you. Regards, Juan José Santamaría Flecha
Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform
On Mon, Mar 2, 2020 at 7:48 AM Michael Paquier wrote: > > On top of that, and that's a separate issue, I have noticed that we > have exactly zero documentation about PG_COLORS (the plural flavor, > not the singular), but we have code for it in common/logging.c.. > Yeah, there is nothing about it prior to [1]. So, this conversation will have to be carried over there. > Anyway, committed down to 12, after tweaking a few things. > Thank you. [1] https://www.postgresql.org/message-id/bbdcce43-bd2e-5599-641b-9b44b9e0a...@2ndquadrant.com Regards, Juan José Santamaría Flecha
Re: Planning counters in pg_stat_statements (using pgss_store)
On Sun, Mar 1, 2020 at 3:55 PM legrand legrand wrote: > > >> I like the idea of adding a check for a non-zero queryId in the new > >> pgss_planner_hook() (zero queryid shouldn't be reserved for > >> utility_statements ?). > > > Some assert hit later, I can say that it's not always true. For > > instance a CREATE TABLE AS won't run parse analysis for the underlying > > query, as this has already been done for the original statement, but > > will still call the planner. I'll change pgss_planner_hook to ignore > > such cases, as pgss_store would otherwise think that it's a utility > > statement. That'll probably incidentally fix the IVM incompatibility. > > Today with or without test on parse->queryId != UINT64CONST(0), > CTAS is collected as a utility_statement without planning counter. > This seems to me respectig the rule, not sure that this needs any > new (risky) change to the actual (quite stable) patch. But the queryid ends up not being computed the same way: # select queryid, query, plans, calls from pg_stat_statements where query like 'create table%'; queryid | query | plans | calls -++---+--- 8275950546884151007 | create table test as select 1; | 1 | 0 7546197440584636081 | create table test as select 1 | 0 | 1 (2 rows) That's because CreateTableAsStmt->query doesn't have a query location/len, as transformTopLevelStmt is only setting that for the top level Query. That's probably an oversight in ab1f0c82257, but I'm not sure what's the best way to fix that. Should we pass that information to all transformXXX function, or let transformTopLevelStmt handle that.
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On 2020/02/26 23:18, Fujii Masao wrote: On 2020/02/18 21:31, Fujii Masao wrote: On 2020/02/18 16:53, Amit Langote wrote: On Tue, Feb 18, 2020 at 4:42 PM Fujii Masao wrote: On 2020/02/18 16:02, Amit Langote wrote: I noticed that there is missing tag in the documentation changes: Could you tell me where I should add tag? + + waiting for checkpoint to finish + + The WAL sender process is currently performing + pg_start_backup to set up for + taking a base backup, and waiting for backup start + checkpoint to finish. + + There should be a between and at the end of the hunk shown above. Will fix. Thanks! Just to clarify, that's the missing tag I am talking about above. OK, so I added tag just after the above . + + Whenever pg_basebackup is taking a base + backup, the pg_stat_progress_basebackup + view will contain a row for each WAL sender process that is currently + running BASE_BACKUP replication command + and streaming the backup. I understand that you wrote "Whenever pg_basebackup is taking a backup...", because description of other views contains a similar starting line. But, it may not only be pg_basebackup that would be served by this view, no? It could be any tool that speaks Postgres' replication protocol and thus be able to send a BASE_BACKUP command. If that is correct, I would write something like "When an application is taking a backup" or some such without specific reference to pg_basebackup. Thoughts? Yeah, there may be some such applications. But most users would use pg_basebackup, so getting rid of the reference to pg_basebackup would make the description a bit difficult-to-read. Also I can imagine that an user of those backup applications would get to know the progress reporting view from their documents. So I prefer the existing one or something like "Whenever an application like pg_basebackup ...". Thought? Sure, "an application like pg_basebackup" sounds fine to me. OK, I changed the doc that way. Attached the updated version of the patch. Attached is the updated version of the patch. The previous patch used only pgstat_progress_update_param() even when updating multiple values. Since those updates are not atomic, this can cause readers of the values to see the intermediate states. To avoid this issue, the latest patch uses pgstat_progress_update_multi_param(), instead. Attached the updated version of the patch. Barring any objections, I plan to commit this patch. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 87586a7b06..dd4a668eea 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -376,6 +376,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_basebackuppg_stat_progress_basebackup + One row for each WAL sender process streaming a base backup, + showing current progress. + See . + + + @@ -3535,7 +3543,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, certain commands during command execution. Currently, the only commands which support progress reporting are ANALYZE, CLUSTER, - CREATE INDEX, and VACUUM. + CREATE INDEX, VACUUM, + and (i.e., replication + command that issues to take + a base backup). This may be expanded in the future. @@ -4336,6 +4347,156 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + + Base Backup Progress Reporting + + + Whenever an application like pg_basebackup + is taking a base backup, the + pg_stat_progress_basebackup + view will contain a row for each WAL sender process that is currently + running BASE_BACKUP replication command + and streaming the backup. The tables below describe the information + that will be reported and provide information about how to interpret it. + + + + pg_stat_progress_basebackup View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of a WAL sender process. + + + phase + text + Current processing phase. See . + + + backup_total + bigint + + Total amount of data that will be streamed. If progress reporting + is not enabled in pg_basebackup + (i.e., --progress option is not specified), + this is 0. Otherwise, this is estimated and + reported as of the beginning of + streaming database files phase. Note that + this is only an approximation since the database + may change during streaming database files phase + and WAL log may be included in the backup later. This is always + the same value as backup_streamed + once the amount of data streamed exceeds the
Re: Trying to pull up EXPR SubLinks
On Fri, Feb 28, 2020 at 11:35 PM Tom Lane wrote: > Richard Guo writes: > > On Fri, Feb 28, 2020 at 3:02 PM Andy Fan > wrote: > >> Glad to see this. I think the hard part is this transform is not > *always* > >> good. for example foo.a only has 1 rows, but bar has a lot of rows, if > >> so the original would be the better one. > > > Yes exactly. TBH I'm not sure how to achieve that. > > Yeah, I was about to make the same objection when I saw Andy already had. > Without some moderately-reliable way of estimating whether the change > is actually a win, I think we're better off leaving it out. The user > can always rewrite the query for themselves if the grouped implementation > would be better -- but if the planner just does it blindly, there's no > recourse when it's worse. > Yes, that makes sense. > > > Any ideas on this part? > > I wonder whether it'd be possible to rewrite the query, but then > consider two implementations, one where the equality clause is > pushed down into the aggregating subquery as though it were LATERAL. > You'd want to be able to figure out that the presence of that clause > made it unnecessary to do the GROUP BY ... but having done so, a > plan treating the aggregating subquery as LATERAL ought to be pretty > nearly performance-equivalent to the current way. So this could be > mechanized in the current planner structure by treating that as a > parameterized path for the subquery, and comparing it to unparameterized > paths that calculate the full grouped output. > I suppose this would happen in/around function set_subquery_pathlist. When we generate access paths for the subquery, we try to push down the equality clause into subquery, remove the unnecessary GROUP BY, etc. and then perform another run of subquery_planner to generate the parameterized path, and add it to the RelOptInfo for the subquery. So that we can do comparison to unparameterized paths. Am I understanding it correctly? > > Obviously it'd be a long slog from here to there, but it seems like > maybe that could be made to work. There's a separate question about > whether it's really worth the trouble, seeing that the optimization > is available today to people who rewrite their queries. > If I understand correctly as above, yes, this would take quite a lot of effort. Not sure if it's still worth doing. Thanks Richard