Re: Is Recovery actually paused?
On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada wrote: > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > wrote: > > > +1 to just show the recovery pause state in the output of > > > pg_is_wal_replay_paused. But, should the function name > > > "pg_is_wal_replay_paused" be something like > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > in a function, I expect a boolean output. Others may have better > > > thoughts. > > > > Maybe we should leave the existing function pg_is_wal_replay_paused() > > alone and add a new one with the name you suggest that returns text. > > That would create less burden for tool authors. > > +1 > Yeah, we can do that, I will send an updated patch soon. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: simplifying foreign key/RI checks
Hi Amit-san, Thanks for the answer! > If you only tested insert/update on the referencing table, I would've > expected to see nothing in the result of that query, because the patch > eliminates all use of SPI in that case. I suspect the CachedPlan* > memory contexts you are seeing belong to some early activity in the > session. So if you try the insert/update in a freshly started > session, you would see 0 rows in the result of that query. That's right. CREATE PARTITION TABLE included in the test script(rep.sql) was using SPI. In a new session, I confirmed that CachedPlan is not generated when only execute INSERT. # only execute INSERT postgres=# INSERT INTO ps SELECT generate_series(1,4999); INSERT 0 4999 postgres=# postgres=# INSERT INTO pr SELECT i, i from generate_series(1,4999)i; INSERT 0 4999 postgres=# SELECT name, sum(used_bytes) as bytes, pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts WHERE name LIKE 'Cached%' GROUP BY name; name | bytes | pg_size_pretty --+---+ (0 rows) ★ No CachedPlan > Hmm, the patch tries to solve a general problem that SPI plans are not > being shared among partitions whereas they should be. So I don't > think that it's necessarily specific to DELETE. Until we have a > solution like the patch on this thread for DELETE, it seems fine to > consider the other patch as a stopgap solution. I see. So this is a solution to the problem of using SPI plans in partitions, not just DELETE. I agree with you, I think this is a solution to the current problem. Best Regards, -- Keisuke Kuroda NTT Software Innovation Center keisuke.kuroda.3...@gmail.com
RE: Parallel Inserts in CREATE TABLE AS
Hi Bharath, I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times. Most of the tests execution become faster with this patch. However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table size(2/10/20 millions), they all have a 6%-10% declines. I think it may need some check at this problem. Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file. reg%=(patched-master)/master Test NO | Test Case |reg% | patched(ms) | master(ms) ||--|--|- 1 | CTAS select from table| -9% | 16709.50477 | 18370.76660 2 | Append plan | -14% | 16542.97807 | 19305.86600 3 | initial plan under Gather node| -5% | 13374.27187 | 14120.02633 4 | CTAS table| 10% | 20835.48800 | 18986.40350 5 | CTAS select from execute | -6% | 16973.73890 | 18008.59789 About Test NO 4: In master(HEAD), this test case picks serial seq scan. query plan likes: -- Seq Scan on public.tenk1 (cost=0.00..444828.12 rows=1012 width=244) (actual time=0.005..1675.268 rows=1000 loops=1) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Planning Time: 0.053 ms Execution Time: 20165.023 ms With this patch, it will choose parallel seq scan and parallel insert. query plan likes: -- Gather (cost=1000.00..370828.03 rows=1012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Workers Planned: 4 Workers Launched: 4 -> Create test -> Parallel Seq Scan on public.tenk1 (cost=0.00..369828.03 rows=253 width=244) (actual time=0.021..411.094 rows=200 loops=5) Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 Worker 0: actual time=0.023..390.856 rows=1858407 loops=1 Worker 1: actual time=0.024..468.587 rows=2264494 loops=1 Worker 2: actual time=0.023..473.170 rows=2286580 loops=1 Worker 3: actual time=0.027..373.727 rows=1853216 loops=1 Planning Time: 0.053 ms Execution Time: 20437.643 ms test machine spec: CentOS 8.2, 128G RAM, 40 processors, disk SAS Regards, Tang test_ctas.sql Description: test_ctas.sql
Re: Support for NSS as a libpq TLS backend
On Wed, Jan 20, 2021 at 05:07:08PM +, Jacob Champion wrote: > Lovely. I didn't expect *removing* an extension to effectively *add* > more, but I'm glad it works now. My apologies for chiming in. I was looking at your patch set here, and while reviewing the strong random and cryptohash parts I have found a couple of mistakes in the ./configure part. I think that the switch from --with-openssl to --with-ssl={openssl} could just be done independently as a building piece of the rest, then the first portion based on NSS could just add the minimum set in configure.ac. Please note that the patch set has been using autoconf from Debian, or something forked from upstream. There were also missing updates in several parts of the code base, and a lack of docs for the new switch. I have spent time checking that with --with-openssl to make sure that the obsolete grammar is still compatible, --with-ssl=openssl and also without it. Thoughts? -- Michael From b3e18564697bdb36a8eba4afe6434294676b528b Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 27 Jan 2021 16:36:45 +0900 Subject: [PATCH] Introduce --with-ssl={openssl} in configure options --- src/include/pg_config.h.in| 2 +- src/backend/libpq/Makefile| 2 +- src/backend/libpq/hba.c | 2 +- src/common/Makefile | 2 +- src/interfaces/libpq/Makefile | 2 +- src/test/Makefile | 2 +- src/test/modules/Makefile | 2 +- .../modules/ssl_passphrase_callback/Makefile | 2 +- .../ssl_passphrase_callback/t/001_testfunc.pl | 2 +- src/test/ssl/Makefile | 2 +- src/test/ssl/t/001_ssltests.pl| 2 +- src/test/ssl/t/002_scram.pl | 2 +- doc/src/sgml/installation.sgml| 5 +- doc/src/sgml/pgcrypto.sgml| 2 +- doc/src/sgml/sslinfo.sgml | 2 +- contrib/Makefile | 2 +- contrib/pgcrypto/Makefile | 4 +- configure | 110 +++--- configure.ac | 31 +++-- src/Makefile.global.in| 2 +- src/tools/msvc/Solution.pm| 2 +- src/tools/msvc/config_default.pl | 2 +- 22 files changed, 113 insertions(+), 73 deletions(-) diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index f4d9f3b408..55cab4d2bf 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -899,7 +899,7 @@ /* Define to select named POSIX semaphores. */ #undef USE_NAMED_POSIX_SEMAPHORES -/* Define to build with OpenSSL support. (--with-openssl) */ +/* Define to build with OpenSSL support. (--with-ssl=openssl) */ #undef USE_OPENSSL /* Define to 1 to build with PAM support. (--with-pam) */ diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile index efc5ef760a..8d1d16b0fc 100644 --- a/src/backend/libpq/Makefile +++ b/src/backend/libpq/Makefile @@ -28,7 +28,7 @@ OBJS = \ pqmq.o \ pqsignal.o -ifeq ($(with_openssl),yes) +ifeq ($(with_ssl),openssl) OBJS += be-secure-openssl.o endif diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c index 371dccb852..20bf1461ce 100644 --- a/src/backend/libpq/hba.c +++ b/src/backend/libpq/hba.c @@ -1041,7 +1041,7 @@ parse_hba_line(TokenizedLine *tok_line, int elevel) ereport(elevel, (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg("hostssl record cannot match because SSL is not supported by this build"), - errhint("Compile with --with-openssl to use SSL connections."), + errhint("Compile with --with-ssl=openssl to use SSL connections."), errcontext("line %d of configuration file \"%s\"", line_num, HbaFileName))); *err_msg = "hostssl record cannot match because SSL is not supported by this build"; diff --git a/src/common/Makefile b/src/common/Makefile index 1a1d0d3406..5422579a6a 100644 --- a/src/common/Makefile +++ b/src/common/Makefile @@ -80,7 +80,7 @@ OBJS_COMMON = \ wait_error.o \ wchar.o -ifeq ($(with_openssl),yes) +ifeq ($(with_ssl),openssl) OBJS_COMMON += \ protocol_openssl.o \ cryptohash_openssl.o diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index c4fde3f93d..2b7ace52e4 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -45,7 +45,7 @@ OBJS = \ pqexpbuffer.o \ fe-auth.o -ifeq ($(with_openssl),yes) +ifeq ($(with_ssl),openssl) OBJS += \ fe-secure-common.o \ fe-secure-openssl.o diff --git a/src/test/Makefile b/src/test/Makefile index ab1ef9a475..f7859c2fd5 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -28,7 +28,7 @@ ifneq (,$(filter ldap,$(PG_TEST_EXTRA))) SUBDIRS += ldap endif endif -ifeq ($(with_openssl),yes) +ifeq ($(with_ssl),openssl) ifneq (,$(filter ssl,$(PG_TEST_EXTRA)))
Re: Transactions involving multiple postgres foreign servers, take 2
On Sat, Jan 16, 2021 at 1:39 AM Zhihong Yu wrote: > > Hi, Thank you for reviewing the patch! > For v32-0004-Add-PrepareForeignTransaction-API.patch : > > + * Whenever a foreign transaction is processed, the corresponding FdwXact > + * entry is update. To avoid holding the lock during transaction > processing > + * which may take an unpredicatable time the in-memory data of foreign > > entry is update -> entry is updated > > unpredictable -> unpredictable Fixed. ¨ > > + int nlefts = 0; > > nlefts -> nremaining > > + elog(DEBUG1, "left %u foreign transactions", nlefts); > > The message can be phrased as "%u foreign transactions remaining" Fixed. > > +FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts) > > Fdw and Xact are repeated. Seems one should suffice. How about naming the > method FdwXactResolveTransactions() ? > Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact) Agreed. I changed to ResolveFdwXacts() and ResolveOneFdwXact() respectively to avoid a long function name. > > For get_fdwxact(): > > + /* This entry matches the condition */ > + found = true; > + break; > > Instead of breaking and returning, you can return within the loop directly. Fixed. Those changes are incorporated into the latest version patches[1] I submitted today. Regards, [1] https://www.postgresql.org/message-id/CAD21AoBYyA5O%2BFPN4Cs9YWiKjq319BvF5fYmKNsFTZfwTcWjQw%40mail.gmail.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Is Recovery actually paused?
On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > wrote: > > +1 to just show the recovery pause state in the output of > > pg_is_wal_replay_paused. But, should the function name > > "pg_is_wal_replay_paused" be something like > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > in a function, I expect a boolean output. Others may have better > > thoughts. > > Maybe we should leave the existing function pg_is_wal_replay_paused() > alone and add a new one with the name you suggest that returns text. > That would create less burden for tool authors. +1 Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: fix typo in reorderbuffer.c
> >> Combocid's ==>> Combocids > > > > Ping again, just in case it’s missed. > > There are in the comments "ComboCIDs", "combocids" and "ComboCids" on top > of the existing Combocid's. Your patch introduces a fourth way of writing > that. It may be better to just have one way to govern them all. Hi Michael, Thanks for the review. Actually, "Combocid's" in the patch is the first word of the sentence, so the first char is uppercase(may be not a new style). I agree that it’s better to have a common way, but some different style of combocid follow the variable or functionname[1]. We may need to change the var name or function name too. Personally , I prefer "combocids". But do you think we can change that in a separate patch apart from this typo patch ? [1]: void SerializeComboCIDState(Size maxsize, char *start_address) static int usedComboCids = 0; /* number of elements in comboCids */ ... Best regards, houzj
Re: FailedAssertion in heap_index_delete_tuples at heapam.c:7220
On Tue, Jan 26, 2021 at 10:52 PM Jaime Casanova wrote: > ${subject} happened while executing ${attached query} at regresssion > database, using 14dev (commit > d5a83d79c9f9b660a6a5a77afafe146d3c8c6f46) and produced ${attached > stack trace}. I see the bug: gistprunepage() calls index_compute_xid_horizon_for_tuples() (which ultimately calls the heapam.c callback for heap_index_delete_tuples()) with an empty array, which we don't expect. The similar code within _hash_vacuum_one_page() already only calls index_compute_xid_horizon_for_tuples() when ndeletable > 0. The fix is obvious: Bring gistprunepage() in line with _hash_vacuum_one_page(). I'll go push a fix for that now. Thanks for the report! -- Peter Geoghegan
Re: simplifying foreign key/RI checks
Hi Amit-san, + case TM_Invisible: + elog(ERROR, "attempted to lock invisible tuple"); + break; + + case TM_SelfModified: + case TM_BeingModified: + case TM_WouldBlock: + elog(ERROR, "unexpected table_tuple_lock status: %u", res); + break; + default: + elog(ERROR, "unrecognized table_tuple_lock status: %u", res); All of these are meant as debugging elog()s for cases that won't normally occur. IIUC, the discussion at the linked thread excludes those from consideration. Thanks for your explanation. Ah, I reread the thread, and I now realized that user visible log messages are the target to replace. I understood that that elog() for the cases won't normally occur. Sorry for the noise. Regards, Tatsuro Yamada
Re: New IndexAM API controlling index vacuum strategies
On Fri, Jan 22, 2021 at 2:00 PM Peter Geoghegan wrote: > > On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan wrote: > > Looks good. I'll give this version a review now. I will do a lot more > > soon. I need to come up with a good benchmark for this, that I can > > return to again and again during review as needed. > > I performed another benchmark, similar to the last one but with the > latest version (v2), and over a much longer period. Attached is a > summary of the whole benchmark, and log_autovacuum output from the > logs of both the master branch and the patch. Thank you for performing the benchmark! > > This was pgbench scale 2000, 4 indexes on pgbench_accounts, and a > transaction with one update and two selects. Each run was 4 hours, and > we alternate between patch and master for each run, and alternate > between 16 and 32 clients. There were 8 4 hour runs in total, meaning > the entire set of runs took 8 * 4 hours = 32 hours (not including > initial load time and a few other small things like that). I used a > 10k TPS rate limit, so TPS isn't interesting here. Latency is > interesting -- we see a nice improvement in latency (i.e. a reduction) > for the patch (see all.summary.out). What value is set to fillfactor? > > The benefits of the patch are clearly visible when I drill down and > look at the details. Each pgbench_accounts autovacuum VACUUM operation > can finish faster with the patch because they can often skip at least > some indexes (usually the PK, sometimes 3 out of 4 indexes total). But > it's more subtle than some might assume. We're skipping indexes that > VACUUM actually would have deleted *some* index tuples from, which is > very good. Bottom-up index deletion is usually lazy, and only > occasionally very eager, so you still have plenty of "floating > garbage" index tuples in most pages. And now we see VACUUM behave a > little more like bottom-up index deletion -- it is lazy when that is > appropriate (with indexes that really only have floating garbage that > is spread diffusely throughout the index structure), and eager when > that is appropriate (with indexes that get much more garbage). That's very good. I'm happy that this patch efficiently utilizes bottom-up index deletion feature. Looking at the relation size growth, there is almost no difference between master and patched in spite of skipping some vacuums in the patched test, which is also good. > > The benefit is not really that we're avoiding doing I/O for index > vacuuming (though that is one of the smaller benefits here). The real > benefit is that VACUUM is not dirtying pages, since it skips indexes > when it would be "premature" to vacuum them from an efficiency point > of view. This is important because we know that Postgres throughput is > very often limited by page cleaning. Also, the "economics" of this new > behavior make perfect sense -- obviously it's more efficient to delay > garbage cleanup until the point when the same page will be modified by > a backend anyway -- in the case of this benchmark via bottom-up index > deletion (which deletes all garbage tuples in the leaf page at the > point that it runs for a subset of pointed-to heap pages -- it's not > using an oldestXmin cutoff from 30 minutes ago). So whenever we dirty > a page, we now get more value per additional-page-dirtied. > > I believe that controlling the number of pages dirtied by VACUUM is > usually much more important than reducing the amount of read I/O from > VACUUM, for reasons I go into on the recent "vacuum_cost_page_miss > default value and modern hardware" thread. As a further consequence of > all this, VACUUM can go faster safely and sustainably (since the cost > limit is not affected so much by vacuum_cost_page_miss), which has its > own benefits (e.g. oldestXmin cutoff doesn't get so old towards the > end). > > Another closely related huge improvement that we see here is that the > number of FPIs generated by VACUUM can be significantly reduced. This > cost is closely related to the cost of dirtying pages, but it's worth > mentioning separately. You'll see some of that in the log_autovacuum > log output I attached. Makes sense. > > There is an archive with much more detailed information, including > dumps from most pg_stat_* views at key intervals. This has way more > information than anybody is likely to want: > > https://drive.google.com/file/d/1OTiErELKRZmYnuJuczO2Tfcm1-cBYITd/view?usp=sharing > > I did notice a problem, though. I now think that the criteria for > skipping an index vacuum in the third patch from the series is too > conservative, and that this led to an excessive number of index > vacuums with the patch. Maybe that's why there are 5 autovacuum runs on pgbench_accounts in the master branch whereas there are 7 runs in the patched? > This is probably because there was a tiny > number of page splits in some of the indexes that were not really > supposed to grow. I believe that this is caused by ANALYZE
FailedAssertion in heap_index_delete_tuples at heapam.c:7220
Hi, ${subject} happened while executing ${attached query} at regresssion database, using 14dev (commit d5a83d79c9f9b660a6a5a77afafe146d3c8c6f46) and produced ${attached stack trace}. Sadly just loading the regression database and executing this query is not enough to reproduce. Not sure what else I can do to help with this one. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {0, 140734235450160, 2, 6, 6456150, 94891112648656, 4611686018427388799, 140604537363110, 0, 281470681751457, 0, 0, 0, 0, 0, 0}} pid = tid = ret = #1 0x7fe10b488535 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x3feb0800, sa_sigaction = 0x3feb0800}, sa_mask = {__val = {0, 4605253854145256240, 0, 13781012405103410633, 0, 140604535119861, 2, 7220455706289059512, 7018409646405740088, 0, 7003723482445100592, 0, 3166617362634023936, 140734235450400, 684527088, 140734235451264}}, sa_flags = 0, sa_restorer = 0x28cd0df0} sigs = {__val = {32, 0 }} #2 0x564d8f7748be in ExceptionalCondition (conditionName=0x564d8f800559 "delstate->ndeltids > 0", errorType=0x564d8f7fe869 "FailedAssertion", fileName=0x564d8f7fe948 "heapam.c", lineNumber=7220) at assert.c:69 No locals. #3 0x564d8f176825 in heap_index_delete_tuples (rel=0x7fe1022847c8, delstate=0x7fff3e1ca120) at heapam.c:7220 latestRemovedXid = 0 blkno = 4294967295 buf = 0 page = 0x0 maxoff = 0 priorXmax = 2446144864 prefetch_state = {cur_hblkno = 4294967295, next_item = 0, ndeltids = 0, deltids = 0x564d91cd3748} prefetch_distance = 10 SnapshotNonVacuumable = {snapshot_type = SNAPSHOT_NON_VACUUMABLE, xmin = 0, xmax = 2446145024, xip = 0x216, xcnt = 0, subxip = 0x, subxcnt = 1042063264, suboverflowed = 255, takenDuringRecovery = 127, copied = false, curcid = 1042063408, speculativeToken = 32767, vistest = 0x564d8fab1c10 , active_count = 2446145040, regd_count = 22093, ph_node = { first_child = 0x7fff3e1ca070, next_sibling = 0x7fff3e1ca060, prev_or_parent = 0x564d91ddda08}, whenTaken = 462, lsn = 18446744071860729372, snapXactCompletionCount = 0} finalndeltids = 0 nblocksaccessed = 0 nblocksfavorable = 0 curtargetfreespace = 0 lastfreespace = 0 actualfreespace = 0 bottomup_final_block = false #4 0x564d8f195780 in table_index_delete_tuples (rel=0x7fe1022847c8, delstate=0x7fff3e1ca120) at ../../../../src/include/access/tableam.h:1240 No locals. #5 0x564d8f195e6a in index_compute_xid_horizon_for_tuples (irel=0x564d924f6428, hrel=0x7fe1022847c8, ibuf=807, itemnos=0x7fff3e1ca190, nitems=0) at genam.c:331 delstate = {bottomup = false, bottomupfreespace = 0, ndeltids = 0, deltids = 0x564d91cd3748, status = 0x564d91cd3768} latestRemovedXid = 0 ipage = 0x7fe102fc5f00 "" itup = 0x7fff3e1ca630 #6 0x564d8f13e568 in gistprunepage (rel=0x564d924f6428, page=0x7fe102fc5f00 "", buffer=807, heapRel=0x7fe1022847c8) at gist.c:1669 deletable = {1, 0, 0, 0, 46976, 37580, 22016, 0, 41520, 15900, 32767, 0, 42544, 15900, 32767, 0, 42752, 15900, 0, 0, 2674, 2890, 32737, 0, 1, 0, 25580, 15290, 41984, 48994, 5558, 11250, 42752, 15900, 32767, 0, 52880, 36628, 22093, 0, 46976, 37580, 22093, 0, 13728, 37325, 22093, 0, 32512, 765, 32737, 0, 25640, 37455, 22093, 0, 0, 0, 2, 0, 65535, 255, 0, 0, 41472, 15900, 32767, 0, 14468, 566, 32737, 0, 41536, 15900, 0, 256, 14468, 566, 32737, 0, 13736, 37325, 22093, 0, 25640, 37455, 22093, 0, 0, 0, 0, 0, 0, 512, 32737, 0, 41584, 15900, 32767, 0, 47989, 36700, 22093, 0, 41712, 15900, 0, 256, 14468, 566, 32737, 0, 41632, 15900, 32767, 0, 56219, 36700, 22093, 0, 0, 0, 0, 0, 14464, 566, 32737, 0, 0, 0, 3, 0, 21856, 174, 0, 8192, 41680, 15900, 32767, 0, 56472, 36700, 22093, 0, 11968, 37108, 22093, 0, 0, 0, 0, 0, 14480, 566, 32737, 0, 14464, 566, 32737, 0, 41712, 15900, 32767, 0, 54083, 36639, 22093, 0, 40672, 765, 32737, 0, 25640, 37455, 22093, 0, 0, 0, 0, 0, 0, 512, 32737, 0, 41776, 15900, 0, 256, 14468, 566, 32737, 0, 41776, 15900, 32767, 0, 47349, 36700, 22093, 0...} ndeletable = 0 offnum = 292 maxoff = 291 latestRemovedXid = 0 #7 0x564d8f139e3e in gistplacetopage (rel=0x564d924f6428, freespace=0, giststate=0x564d92ccb780, buffer=807, itup=0x7fff3e1ca6e0, ntup=1, oldoffnum=0, newblkno=0x0, leftchildbuf=0, splitinfo=0x7fff3e1ca690, markfollowright=true, heapRel=0x7fe1022847c8, is_build=false) at gist.c:274 blkno = 4 page = 0x7fe102fc5f00 "" is_leaf = true recptr = 94891117474655 i = 32767 is_split = true
Re: simplifying foreign key/RI checks
Yamada-san, On Wed, Jan 27, 2021 at 8:51 AM Tatsuro Yamada wrote: > On 2021/01/25 18:19, Amit Langote wrote: > > On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker > > wrote: > >> Anybody else want to look this patch over before I mark it Ready For > >> Committer? > > > > Would be nice to have others look it over. Thanks. > > Thanks for creating the patch! > > I tried to review the patch. Here is my comment. Thanks for the comment. > * According to this thread [1], it might be better to replace elog() with >ereport() in the patch. > > [1]: > https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com Could you please tell which elog() of the following added by the patch you are concerned about? + case TM_Invisible: + elog(ERROR, "attempted to lock invisible tuple"); + break; + + case TM_SelfModified: + case TM_BeingModified: + case TM_WouldBlock: + elog(ERROR, "unexpected table_tuple_lock status: %u", res); + break; + default: + elog(ERROR, "unrecognized table_tuple_lock status: %u", res); All of these are meant as debugging elog()s for cases that won't normally occur. IIUC, the discussion at the linked thread excludes those from consideration. -- Amit Langote EDB: http://www.enterprisedb.com
Re: fix typo in reorderbuffer.c
On Wed, Jan 27, 2021 at 05:57:06AM +, Hou, Zhijie wrote: >> Combocid's ==>> Combocids > > Ping again, just in case it’s missed. There are in the comments "ComboCIDs", "combocids" and "ComboCids" on top of the existing Combocid's. Your patch introduces a fourth way of writing that. It may be better to just have one way to govern them all. -- Michael signature.asc Description: PGP signature
Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently
At Wed, 27 Jan 2021 05:30:29 +, "tsunakawa.ta...@fujitsu.com" wrote in > From: Kyotaro Horiguchi > > "CREATE TABLE" is not "CREATE LOGGED TABLE". We can assume that as > > "CREATE TABLE", where the default logged-ness > > varies according to context. Or it might have been so since the beginning. > > Currently we don't have the syntax "CREATE LOGGED TABLE", but we can add > > that syntax. > > Yes, I thought of that possibility after a while too. But I felt a bit(?) > hesitant to do it considering back-patching. Also, the idea requires ALTER > TABLE ATTACH PARTITION will have to modify the logged-ness property of the > target partition and its subpartitions with that of the parent partitioned > table. However, your idea is one candidate worth pursuing, including whether > or not to back-patch what. I think this and all possible similar changes won't be back-patched at all. It is a desaster for any establish version to change this kind of behavior. > > > We pursue relasing all fixes at once but we might release all fixes other > > than > > some items that cannot be fixed for some technical reasons at the time, > > like > > REPLICA IDENITTY. > > > > I'm not sure how long we will wait for the time of release, though. > > Anyway, we have to define the ideal behavior for each ALTER action based on > some comprehensible principle. Yeah... this may become a long, tiring > journey. (I anticipate some difficulty and compromise in reaching agreement, > as was seen in the past discussion for the fix for ALTER TABLE REPLICA > IDENTITY. Scary) It seems to me that we have agreed that the ideal behavior for ALTER TABLE on a parent to propagate to descendants. An observed objection is that behavior is dangerous for operations that requires large amount of resources that could lead to failure after elapsing a long time. The revealed difficulty of REPLICA IDENTITY is regarded as a technical issue that should be overcome. > FWIW, I wonder why Postgres decided to allow different logical structure of > tables such as DEFAULT values and constraints between the parent partitioned > table and a child partition. That extra flexibility might stand in the way > to consensus. I think it'd be easy to understand that partitions are simply > physically independent containers that have identical logical structure. I understand that -- once upon a time the "traditional" partitioning was mere a use case of inheritance tables. Parents and children are basically independent from each other other than inherited attributes. The nature of table inheritance compel column addition to a parent to inherit to children, but other changes need not to be propagated. As time goes by, partition becomes the major use and new features added at the time are affected by the recognition. After the appearance of native partitioning, the table inheritance seems to be regarded as a heritage that we need to maintain. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: fix typo in reorderbuffer.c
> I found a possible typo in reorderbuffer.c > > * has got a combocid. Combocid's are only valid for the duration > of a > > Combocid's ==>> Combocids > > Attatching a small patch to correct it. > Ping again, just in case it’s missed.
Re: doc review for v14
On Sat, Jan 23, 2021 at 07:15:40PM +0900, Michael Paquier wrote: > I have been looking at 0005, the patch dealing with the docs of the > replication stats, and have some comments. And attached is a patch to clarify all that. I am letting that sleep for a couple of days for now, so please let me know if you have any comments. -- Michael diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 9496f76b1f..c602ee4427 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -317,7 +317,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser pg_stat_replication_slotspg_stat_replication_slots One row per replication slot, showing statistics about - replication slot usage. + the replication slot's usage. See pg_stat_replication_slots for details. @@ -2604,10 +2604,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i spill_txns bigint -Number of transactions spilled to disk after the memory used by -logical decoding of changes from WAL for this slot exceeds +Number of transactions spilled to disk once the memory used by +logical decoding to decode changes from WAL has exceeded logical_decoding_work_mem. The counter gets -incremented both for toplevel transactions and subtransactions. +incremented for both toplevel transactions and subtransactions. @@ -2616,9 +2616,10 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i spill_count bigint -Number of times transactions were spilled to disk while decoding changes -from WAL for this slot. Transactions may get spilled repeatedly, and -this counter gets incremented on every such invocation. +Number of times transactions were spilled to disk while decoding +changes from WAL for this slot. This counter is incremented each time +a transaction is spilled, and the same transaction may be spilled +multiple times. @@ -2639,11 +2640,12 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i stream_txns bigint -Number of in-progress transactions streamed to the decoding output plugin -after the memory used by logical decoding of changes from WAL for this -slot exceeds logical_decoding_work_mem. Streaming only +Number of in-progress transactions streamed to the decoding output +plugin after the memory used by logical decoding to decode changes +from WAL for this slot has exceeded +logical_decoding_work_mem. Streaming only works with toplevel transactions (subtransactions can't be streamed -independently), so the counter does not get incremented for subtransactions. +independently), so the counter is not incremented for subtransactions. @@ -2653,9 +2655,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Number of times in-progress transactions were streamed to the decoding -output plugin while decoding changes from WAL for this slot. Transactions -may get streamed repeatedly, and this counter gets incremented on every -such invocation. +output plugin while decoding changes from WAL for this slot. This +counter is incremented each time a transaction is streamed, and the +same transaction may be streamed multiple times. @@ -5042,7 +5044,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i - + pg_stat_reset_replication_slot @@ -5050,11 +5052,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i void - Resets statistics to zero for a single replication slot, or for all - replication slots in the cluster. The argument can be either the name - of the slot to reset the stats or NULL. If the argument is NULL, all - counters shown in the pg_stat_replication_slots - view for all replication slots are reset. +Resets statistics of the replication slot defined by the argument. If +the argument is NULL, resets statistics for all +the replication slots. This function is restricted to superusers by default, but other users signature.asc Description: PGP signature
Re: Columns correlation and adaptive query optimization
On Mon, 25 Jan 2021 16:27:25 +0300 Konstantin Knizhnik wrote: > Hello, > > Thank you for review. > My answers are inside. Thank you for updating the patch and answering my questions. > > (2) > > If I understand correctly, your proposal consists of the following two > > features. > > > > 1. Add a feature to auto_explain that creates an extended statistic > > automatically > > if an error on estimated rows number is large. > > > > 2. Improve rows number estimation of join results by considering functional > > dependencies between vars in the join qual if the qual has more than one > > clauses, > > and also functional dependencies between a var in the join qual and vars in > > quals > > of the inner/outer relation. > > > > As you said, these two parts are independent each other, so one feature > > will work > > even if we don't assume the other. I wonder it would be better to split > > the patch > > again, and register them to commitfest separately. > > I agree with you that this are two almost unrelated changes, although > without clausesel patch additional statistic can not improve query planning. I think extended statistics created by the auto_explain patch can improve query planning even without the clausesel patch. For example, suppose the following case: postgres=# create table t ( i int, j int); CREATE TABLE postgres=# insert into t select i/10, i/100 from generate_series(1,100) i; INSERT 0 100 postgres=# analyze t; ANALYZE postgres=# explain analyze select * from t where i = 100 and j = 10; QUERY PLAN - Seq Scan on t (cost=0.00..19425.00 rows=1 width=8) (actual time=0.254..97.293 rows=10 loops=1) Filter: ((i = 100) AND (j = 10)) Rows Removed by Filter: 90 Planning Time: 0.199 ms Execution Time: 97.327 ms (5 rows) After applying the auto_explain patch (without clausesel patch) and issuing the query, additional statistics were created. postgres=# select * from t where i = 100 and j = 10; LOG: Add statistics t_i_j Then, after analyze, the row estimation was improved. postgres=# analyze t; ANALYZE postgres=# explain analyze select * from t where i = 100 and j = 10; postgres=# explain analyze select * from t where i = 100 and j = 10; QUERY PLAN -- Seq Scan on t (cost=0.00..19425.00 rows=10 width=8) (actual time=0.255..95.347 rows=10 loops=1) Filter: ((i = 100) AND (j = 10)) Rows Removed by Filter: 90 Planning Time: 0.124 ms Execution Time: 95.383 ms (5 rows) So, I think the auto_explain patch is useful with just that as a tool to detect a gap between estimate and real and adjust the plan. Also, the clausesel patch would be useful without the auto_explain patch if an appropriate statistics are created. > But I already have too many patches at commitfest. > May be it will be enough to spit this patch into two? Although we can discuss both of these patches in this thread, I wonder we don't have to commit them together. > > > > (3) > > + DefineCustomBoolVariable("auto_explain.suggest_only", > > +"Do not create > > statistic but just record in WAL suggested create statistics statement.", > > +NULL, > > + > > _explain_suggest_on > > > > To imply that this parameter is involving to add_statistics_threshold, it > > seems > > better for me to use more related name like add_statistics_suggest_only. > > > > Also, additional documentations for new parameters are required. > > Done. + + + + auto_explain.auto_explain.add_statistics_threshold (real) + + auto_explain.add_statistics_threshold configuration parameter + + + + + auto_explain.add_statistics_threshold sets the threshold for + actual/estimated #rows ratio triggering creation of multicolumn statistic + for the related columns. It can be used for adpative query optimization. + If there is large gap between real and estimated number of tuples for the + concrete plan node, then multicolumn statistic is created for involved + attributes. Zero value (default) disables implicit creation of multicolumn statistic. + + I wonder we need to say that this parameter has no effect unless log_analyze is enabled and that statistics are created only when the excution time exceeds log_min_duration, if these behaviors are intentional. In addition, additional statistics are created only if #rows is over-estimated and not if it is under-estimated. Although it seems good as a criterion for creating
Re: [PATCH] remove pg_standby
On Wed, Jan 27, 2021 at 6:06 PM Michael Paquier wrote: > On Wed, Jan 27, 2021 at 04:13:24PM +1300, Thomas Munro wrote: > > I would like to commit this, because "waiting restore commands" have > > confusing interactions with my proposed prefetching-during-recovery > > patch[1]. Here's a version that fixes an error when building the docs > > (there was a stray remaining ), and adds a > > commit message. Any objections? > > It looks like you are missing two references in your patch set: > $ git grep pg_standby > doc/src/sgml/high-availability.sgml: Do not use pg_standby or > similar tools with the built-in standby mode > src/backend/access/transam/xlog.c: * segment. Only recycle normal > files, pg_standby for example can create Thanks, fixed. > The logic assumed in RemoveXlogFile() is actually a bit scary. I have > not checked in details but it could be possible to clean up more code > in this area? I think the check that it's a regular file is a good idea anyway, but I removed the offending comment. > > Furthermore, I think we should also remove the section of the manual > > that describes how to write your own "waiting restore command". > > Thoughts? > > Agreed. No objections to that. Thanks! From 77015331a71e67a4dee5352afc9f5faa05524755 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 26 Oct 2020 16:37:46 -0500 Subject: [PATCH v4 1/2] Retire pg_standby. Streaming replication made pg_standby obsolete. It has been proposed that we retire it many times. Now seems like a good time to do it, because "waiting restore commands" are incompatible with a proposed recovery prefetching feature. Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com Author: Justin Pryzby Reviewed-by: Heikki Linnakangas Reviewed-by: Peter Eisentraut Reviewed-by: Michael Paquier --- contrib/Makefile | 1 - contrib/pg_standby/.gitignore | 1 - contrib/pg_standby/Makefile| 20 - contrib/pg_standby/pg_standby.c| 907 - doc/src/sgml/contrib.sgml | 7 +- doc/src/sgml/filelist.sgml | 1 - doc/src/sgml/high-availability.sgml| 17 +- doc/src/sgml/pgstandby.sgml| 394 --- doc/src/sgml/ref/pgarchivecleanup.sgml | 7 - src/backend/access/transam/xlog.c | 2 +- src/tools/msvc/Mkvcbuild.pm| 4 +- 11 files changed, 9 insertions(+), 1352 deletions(-) delete mode 100644 contrib/pg_standby/.gitignore delete mode 100644 contrib/pg_standby/Makefile delete mode 100644 contrib/pg_standby/pg_standby.c delete mode 100644 doc/src/sgml/pgstandby.sgml diff --git a/contrib/Makefile b/contrib/Makefile index 7a4866e338..cdc041c7db 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -33,7 +33,6 @@ SUBDIRS = \ pg_buffercache \ pg_freespacemap \ pg_prewarm \ - pg_standby \ pg_stat_statements \ pg_surgery \ pg_trgm \ diff --git a/contrib/pg_standby/.gitignore b/contrib/pg_standby/.gitignore deleted file mode 100644 index a401b085a8..00 --- a/contrib/pg_standby/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/pg_standby diff --git a/contrib/pg_standby/Makefile b/contrib/pg_standby/Makefile deleted file mode 100644 index 87732bedf1..00 --- a/contrib/pg_standby/Makefile +++ /dev/null @@ -1,20 +0,0 @@ -# contrib/pg_standby/Makefile - -PGFILEDESC = "pg_standby - supports creation of a warm standby" -PGAPPICON = win32 - -PROGRAM = pg_standby -OBJS = \ - $(WIN32RES) \ - pg_standby.o - -ifdef USE_PGXS -PG_CONFIG = pg_config -PGXS := $(shell $(PG_CONFIG) --pgxs) -include $(PGXS) -else -subdir = contrib/pg_standby -top_builddir = ../.. -include $(top_builddir)/src/Makefile.global -include $(top_srcdir)/contrib/contrib-global.mk -endif diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c deleted file mode 100644 index c9f33e4254..00 --- a/contrib/pg_standby/pg_standby.c +++ /dev/null @@ -1,907 +0,0 @@ -/* - * contrib/pg_standby/pg_standby.c - * - * - * pg_standby.c - * - * Production-ready example of how to create a Warm Standby - * database server using continuous archiving as a - * replication mechanism - * - * We separate the parameters for archive and nextWALfile - * so that we can check the archive exists, even if the - * WAL file doesn't (yet). - * - * This program will be executed once in full for each file - * requested by the warm standby server. - * - * It is designed to cater to a variety of needs, as well - * providing a customizable section. - * - * Original author: Simon Riggs si...@2ndquadrant.com - * Current maintainer: Simon Riggs - */ -#include "postgres_fe.h" - -#include -#include -#include -#include -#include -#include - -#include "access/xlog_internal.h" -#include "pg_getopt.h" - -const char *progname; - -int WalSegSz = -1; - -/* Options and defaults */ -int sleeptime = 5; /* amount of time to sleep between file checks */ -int waittime = -1; /* how long we have been
RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently
From: Kyotaro Horiguchi > "CREATE TABLE" is not "CREATE LOGGED TABLE". We can assume that as > "CREATE TABLE", where the default logged-ness > varies according to context. Or it might have been so since the beginning. > Currently we don't have the syntax "CREATE LOGGED TABLE", but we can add > that syntax. Yes, I thought of that possibility after a while too. But I felt a bit(?) hesitant to do it considering back-patching. Also, the idea requires ALTER TABLE ATTACH PARTITION will have to modify the logged-ness property of the target partition and its subpartitions with that of the parent partitioned table. However, your idea is one candidate worth pursuing, including whether or not to back-patch what. > We pursue relasing all fixes at once but we might release all fixes other > than > some items that cannot be fixed for some technical reasons at the time, like > REPLICA IDENITTY. > > I'm not sure how long we will wait for the time of release, though. Anyway, we have to define the ideal behavior for each ALTER action based on some comprehensible principle. Yeah... this may become a long, tiring journey. (I anticipate some difficulty and compromise in reaching agreement, as was seen in the past discussion for the fix for ALTER TABLE REPLICA IDENTITY. Scary) FWIW, I wonder why Postgres decided to allow different logical structure of tables such as DEFAULT values and constraints between the parent partitioned table and a child partition. That extra flexibility might stand in the way to consensus. I think it'd be easy to understand that partitions are simply physically independent containers that have identical logical structure. Regards Takayuki Tsunakawa
Re: [PATCH] remove pg_standby
On Wed, Jan 27, 2021 at 04:13:24PM +1300, Thomas Munro wrote: > I would like to commit this, because "waiting restore commands" have > confusing interactions with my proposed prefetching-during-recovery > patch[1]. Here's a version that fixes an error when building the docs > (there was a stray remaining ), and adds a > commit message. Any objections? It looks like you are missing two references in your patch set: $ git grep pg_standby doc/src/sgml/high-availability.sgml: Do not use pg_standby or similar tools with the built-in standby mode src/backend/access/transam/xlog.c: * segment. Only recycle normal files, pg_standby for example can create The logic assumed in RemoveXlogFile() is actually a bit scary. I have not checked in details but it could be possible to clean up more code in this area? > Furthermore, I think we should also remove the section of the manual > that describes how to write your own "waiting restore command". > Thoughts? Agreed. No objections to that. -- Michael signature.asc Description: PGP signature
Re: WIP: WAL prefetch (another approach)
On Sat, Dec 5, 2020 at 7:27 AM Stephen Frost wrote: > * Thomas Munro (thomas.mu...@gmail.com) wrote: > > I just noticed this thread proposing to retire pg_standby on that > > basis: > > > > https://www.postgresql.org/message-id/flat/20201029024412.GP5380%40telsasoft.com > > > > I'd be happy to see that land, to fix this problem with my plan. But > > are there other people writing restore scripts that block that would > > expect them to work on PG14? > > Ok, I think I finally get the concern that you're raising here- > basically that if a restore command was written to sit around and wait > for WAL segments to arrive, instead of just returning to PG and saying > "WAL segment not found", that this would be a problem if we are running > out ahead of the applying process and asking for WAL. > > The thing is- that's an outright broken restore command script in the > first place. If PG is in standby mode, we'll ask again if we get an > error result indicating that the WAL file wasn't found. The restore > command documentation is quite clear on this point: > > The command will be asked for file names that are not present in the > archive; it must return nonzero when so asked. > > There's no "it can wait around for the next file to show up if it wants > to" in there- it *must* return nonzero when asked for files that don't > exist. Well the manual does actually describe how to write your own version of pg_standby, referred to as a "waiting restore script": https://www.postgresql.org/docs/13/log-shipping-alternative.html I've now poked that other thread threatening to commit the removal of pg_standby, and while I was there, also to remove the section on how to write your own (it's possible that I missed some other reference to the concept elsewhere, I'll need to take another look). > So, I don't think that we really need to stress over this. The fact > that pg_standby offers options to have it wait instead of just returning > a non-zero error-code and letting the loop that we already do in the > core code seems like it's really just a legacy thing from before we were > doing that and probably should have been ripped out long ago... Even > more reason to get rid of pg_standby tho, imv, we haven't been properly > adjusting it when we've been making changes to the core code, it seems. So far I haven't heard from anyone who thinks we should keep this old facility (as useful as it was back then when it was the only way), so I hope we can now quietly drop it. It's not strictly an obstacle to this recovery prefetching work, but it'd interact confusingly in hard to describe ways, and it seems strange to perpetuate something that many were already proposing to drop due to obsolescence. Thanks for the comments/sanity check.
Re: logical replication worker accesses catalogs in error context callback
On Wed, Jan 27, 2021 at 7:48 AM Zhihong Yu wrote: > For 0002, a few comments on the description: > > bq. Avoid accessing system catalogues inside conversion_error_callback > > catalogues -> catalog > > bq. error context callback, because the the entire transaction might > > There is redundant 'the' > > bq. Since get_attname() and get_rel_name() could search the sys cache by > store the required information > > store -> storing Thanks for pointing to the changes in the commit message. I corrected them. Attaching v4 patch set, consider it for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From 88767e3176cb6758ac3dd0a0da91f3747f84a4ed Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 26 Jan 2021 13:49:18 +0530 Subject: [PATCH v4] Avoid Catalogue Accesses In slot_store_error_callback Avoid accessing system catalogues inside slot_store_error_callback error context callback, because the the entire transaction might have been broken at this point. Since logicalrep_typmap_gettypname() could search the sys cache by calling to format_type_be(), store both local and remote type names to SlotErrCallbackArg so that we can just set the names in the error callback without sys cache lookup. Author: Masahiko Sawada --- src/backend/replication/logical/worker.c | 48 +--- 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index eb7db89cef..f7c4fe9c4d 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -132,8 +132,9 @@ static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping); typedef struct SlotErrCallbackArg { LogicalRepRelMapEntry *rel; - int local_attnum; int remote_attnum; + char *local_typename; + char *remote_typename; } SlotErrCallbackArg; /* @@ -430,30 +431,19 @@ static void slot_store_error_callback(void *arg) { SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg; - LogicalRepRelMapEntry *rel; - char *remotetypname; - Oid remotetypoid, -localtypoid; + LogicalRepRelMapEntry *rel = errarg->rel; /* Nothing to do if remote attribute number is not set */ if (errarg->remote_attnum < 0) return; - rel = errarg->rel; - remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum]; - - /* Fetch remote type name from the LogicalRepTypMap cache */ - remotetypname = logicalrep_typmap_gettypname(remotetypoid); - - /* Fetch local type OID from the local sys cache */ - localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 1); + Assert(errarg->local_typename && errarg->remote_typename); errcontext("processing remote data for replication target relation \"%s.%s\" column \"%s\", " "remote type %s, local type %s", rel->remoterel.nspname, rel->remoterel.relname, rel->remoterel.attnames[errarg->remote_attnum], - remotetypname, - format_type_be(localtypoid)); + errarg->remote_typename, errarg->local_typename); } /* @@ -474,8 +464,9 @@ slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel, /* Push callback + info on the error context stack */ errarg.rel = rel; - errarg.local_attnum = -1; errarg.remote_attnum = -1; + errarg.local_typename = NULL; + errarg.remote_typename = NULL; errcallback.callback = slot_store_error_callback; errcallback.arg = (void *) errcallback.previous = error_context_stack; @@ -491,11 +482,17 @@ slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel, if (!att->attisdropped && remoteattnum >= 0) { StringInfo colvalue = >colvalues[remoteattnum]; + Oid remotetypid = rel->remoterel.atttyps[remoteattnum]; Assert(remoteattnum < tupleData->ncols); - errarg.local_attnum = i; + /* + * Set the attribute number and both local and remote type names + * from the local sys cache and LogicalRepTypMap respectively. + */ errarg.remote_attnum = remoteattnum; + errarg.local_typename = format_type_be(att->atttypid); + errarg.remote_typename = logicalrep_typmap_gettypname(remotetypid); if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT) { @@ -543,8 +540,9 @@ slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel, slot->tts_isnull[i] = true; } - errarg.local_attnum = -1; errarg.remote_attnum = -1; + errarg.local_typename = NULL; + errarg.remote_typename = NULL; } else { @@ -600,8 +598,9 @@ slot_modify_data(TupleTableSlot *slot, TupleTableSlot *srcslot, /* For error reporting, push callback + info on the error context stack */ errarg.rel = rel; - errarg.local_attnum = -1; errarg.remote_attnum = -1; + errarg.local_typename = NULL; + errarg.remote_typename = NULL; errcallback.callback = slot_store_error_callback; errcallback.arg = (void *) errcallback.previous = error_context_stack; @@ -622,9 +621,15 @@ slot_modify_data(TupleTableSlot *slot, TupleTableSlot
Re: Single transaction in the tablesync worker?
On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila wrote: > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith wrote: > > > > PSA the v18 patch for the Tablesync Solution1. > > 7. Have you tested with the new patch the scenario where we crash > after FINISHEDCOPY and before SYNCDONE, is it able to pick up the > replication using the new temporary slot? Here, we need to test the > case where during the catchup phase we have received few commits and > then the tablesync worker is crashed/errored out? Basically, check if > the replication is continued from the same point? > I have tested this and it didn't work, see the below example. Publisher-side CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); BEGIN; INSERT INTO mytbl1(somedata, text) VALUES (1, 1); INSERT INTO mytbl1(somedata, text) VALUES (1, 2); COMMIT; CREATE PUBLICATION mypublication FOR TABLE mytbl1; Subscriber-side - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync worker stops. CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120)); CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypublication; During debug, stop after we mark FINISHEDCOPY state. Publisher-side INSERT INTO mytbl1(somedata, text) VALUES (1, 3); INSERT INTO mytbl1(somedata, text) VALUES (1, 4); Subscriber-side - Have a breakpoint in apply_dispatch - continue in debugger; - After we replay first commit (which will be for values(1,3), note down the origin position in apply_handle_commit_internal and somehow error out. I have forced the debugger to set to the last line in apply_dispatch where the error is raised. - After the error, again the tablesync worker is restarted and it starts from the position noted in the previous step - It exits without replaying the WAL for (1,4) So, on the subscriber-side, you will see 3 records. Fourth is missing. Now, if you insert more records on the publisher, it will anyway replay those but the fourth one got missing. The temporary slots didn't seem to work because we created again the new temporary slot after the crash and ask it to start decoding from the point we noted in origin_lsn. The publisher didn’t hold the required WAL as our slot was temporary so it started sending from some later point. We retain WAL based on the slots restart_lsn position and wal_keep_size. For our case, the positions of the slots will matter and as we have created temporary slots, there is no way for a publisher to save that WAL. In this particular case, even if the WAL would have been there we only pass the start_decoding_at position but didn’t pass restart_lsn, so it picked a random location (current insert position in WAL) which is ahead of start_decoding_at point so it never sent the required fourth record. Now, I don’t think it will work even if somehow sent the correct restart_lsn because of what I wrote earlier that there is no guarantee that the earlier WAL would have been saved. At this point, I can't think of any way to fix this problem except for going back to the previous approach of permanent slots but let me know if you have any ideas to salvage this approach? -- With Regards, Amit Kapila.
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Wed, Jan 27, 2021 at 01:00:50AM +0300, Alexey Kondratov wrote: > CheckRelationTableSpaceMove() does not feel like a right place for invoking > post alter hooks. It is intended only to check for tablespace change > possibility. Anyway, ATExecSetTableSpace() and > ATExecSetTableSpaceNoStorage() already do that in the no-op case. > > I have removed this InvokeObjectPostAlterHook() from your 0001 and made 0002 > to work on top of it. I think that now it should look closer to what you > described above. Yeah, I was a bit hesitating about this part as those new routines would not be used by ALTER-related commands in the next steps. Your patch got that midway in-between though, by adding the hook to SetRelationTableSpace but not to CheckRelationTableSpaceMove(). For now, I have kept the hook out of those new routines because using an ALTER hook for a utility command is inconsistent. Perhaps we'd want a separate hook type dedicated to utility commands in objectaccess.c. I have double-checked the code, and applied it after a few tweaks. > In the new 0002 I moved ACL check to the upper level, i.e. ExecReindex(), > and removed expensive text generation in test. Not touched yet some of your > previously raised concerns. Also, you made SetRelationTableSpace() to accept > Relation instead of Oid, so now we have to open/close indexes in the > ReindexPartitions(), I am not sure that I use proper locking there, but it > works. Passing down Relation to the new routines makes the most sense to me because we force the callers to think about the level of locking that's required when doing any tablespace moves. + Relation iRel = index_open(partoid, ShareLock); + + if (CheckRelationTableSpaceMove(iRel, params->tablespaceOid)) + SetRelationTableSpace(iRel, + params->tablespaceOid, + InvalidOid); Speaking of which, this breaks the locking assumptions of SetRelationTableSpace(). I feel that we should think harder about this part for partitioned indexes and tables because this looks rather unsafe in terms of locking assumptions with partition trees. If we cannot come up with a safe solution, I would be fine with disallowing TABLESPACE in this case, as a first step. Not all problems have to be solved at once, and even without this part the feature is still useful. + /* It's not a shared catalog, so refuse to move it to shared tablespace */ + if (params->tablespaceOid == GLOBALTABLESPACE_OID) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("cannot move non-shared relation to tablespace \"%s\"", +get_tablespace_name(params->tablespaceOid; Why is that needed if CheckRelationTableSpaceMove() is used? bits32 options;/* bitmask of REINDEXOPT_* */ + Oid tablespaceOid; /* tablespace to rebuild index */ } ReindexParams; Mentioned upthread, but here I think that we should tell that InvalidOid => keep the existing tablespace. -- Michael signature.asc Description: PGP signature
Re: [PATCH] remove pg_standby
On Wed, Nov 25, 2020 at 10:04 PM Peter Eisentraut wrote: > On 2020-11-21 20:41, Justin Pryzby wrote: > > Oops. I guess I'd write something like this. If we just remove it, then > > there'd no place to add a new server application, and "client applications" > > would be the only subsection. > > I have committed the typo fix. I don't have a well-formed opinion yet > about whether all the reservations about removing pg_standby have been > addressed. I would like to commit this, because "waiting restore commands" have confusing interactions with my proposed prefetching-during-recovery patch[1]. Here's a version that fixes an error when building the docs (there was a stray remaining ), and adds a commit message. Any objections? Furthermore, I think we should also remove the section of the manual that describes how to write your own "waiting restore command". Thoughts? [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGKFeYPL9K%2BSRixcsx1%2B6HsHhqK%2BPOZyrnnZjw1jERpGcQ%40mail.gmail.com From 54b12290e304bf1d4c995836addb7a53e8e7b74b Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 26 Oct 2020 16:37:46 -0500 Subject: [PATCH v3] Retire pg_standby. Streaming replication made pg_standby obsolete. It has been proposed that we retire it many times. Now seems like a good time to do it, because "waiting restore commands" are incompatible with a proposed WAL prefetching feature. Discussion: https://postgr.es/m/20201029024412.GP5380%40telsasoft.com Author: Justin Pryzby Reviewed-by: Heikki Linnakangas Reviewed-by: Peter Eisentraut --- contrib/Makefile | 1 - contrib/pg_standby/.gitignore | 1 - contrib/pg_standby/Makefile| 20 - contrib/pg_standby/pg_standby.c| 907 - doc/src/sgml/contrib.sgml | 7 +- doc/src/sgml/filelist.sgml | 1 - doc/src/sgml/high-availability.sgml| 14 +- doc/src/sgml/pgstandby.sgml| 394 --- doc/src/sgml/ref/pgarchivecleanup.sgml | 7 - src/tools/msvc/Mkvcbuild.pm| 4 +- 10 files changed, 7 insertions(+), 1349 deletions(-) delete mode 100644 contrib/pg_standby/.gitignore delete mode 100644 contrib/pg_standby/Makefile delete mode 100644 contrib/pg_standby/pg_standby.c delete mode 100644 doc/src/sgml/pgstandby.sgml diff --git a/contrib/Makefile b/contrib/Makefile index 7a4866e338..cdc041c7db 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -33,7 +33,6 @@ SUBDIRS = \ pg_buffercache \ pg_freespacemap \ pg_prewarm \ - pg_standby \ pg_stat_statements \ pg_surgery \ pg_trgm \ diff --git a/contrib/pg_standby/.gitignore b/contrib/pg_standby/.gitignore deleted file mode 100644 index a401b085a8..00 --- a/contrib/pg_standby/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/pg_standby diff --git a/contrib/pg_standby/Makefile b/contrib/pg_standby/Makefile deleted file mode 100644 index 87732bedf1..00 --- a/contrib/pg_standby/Makefile +++ /dev/null @@ -1,20 +0,0 @@ -# contrib/pg_standby/Makefile - -PGFILEDESC = "pg_standby - supports creation of a warm standby" -PGAPPICON = win32 - -PROGRAM = pg_standby -OBJS = \ - $(WIN32RES) \ - pg_standby.o - -ifdef USE_PGXS -PG_CONFIG = pg_config -PGXS := $(shell $(PG_CONFIG) --pgxs) -include $(PGXS) -else -subdir = contrib/pg_standby -top_builddir = ../.. -include $(top_builddir)/src/Makefile.global -include $(top_srcdir)/contrib/contrib-global.mk -endif diff --git a/contrib/pg_standby/pg_standby.c b/contrib/pg_standby/pg_standby.c deleted file mode 100644 index c9f33e4254..00 --- a/contrib/pg_standby/pg_standby.c +++ /dev/null @@ -1,907 +0,0 @@ -/* - * contrib/pg_standby/pg_standby.c - * - * - * pg_standby.c - * - * Production-ready example of how to create a Warm Standby - * database server using continuous archiving as a - * replication mechanism - * - * We separate the parameters for archive and nextWALfile - * so that we can check the archive exists, even if the - * WAL file doesn't (yet). - * - * This program will be executed once in full for each file - * requested by the warm standby server. - * - * It is designed to cater to a variety of needs, as well - * providing a customizable section. - * - * Original author: Simon Riggs si...@2ndquadrant.com - * Current maintainer: Simon Riggs - */ -#include "postgres_fe.h" - -#include -#include -#include -#include -#include -#include - -#include "access/xlog_internal.h" -#include "pg_getopt.h" - -const char *progname; - -int WalSegSz = -1; - -/* Options and defaults */ -int sleeptime = 5; /* amount of time to sleep between file checks */ -int waittime = -1; /* how long we have been waiting, -1 no wait - * yet */ -int maxwaittime = 0; /* how long are we prepared to wait for? */ -int keepfiles = 0; /* number of WAL files to keep, 0 keep all */ -int maxretries = 3; /* number of retries on restore command */ -bool debug = false; /* are we debugging? */ -bool need_cleanup =
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, When testing the patch with the following kind of sql. --- Insert into part_table select 1; Insert into part_table select generate_series(1,1,1); Insert into part_table select * from testfunc(); --- we usually use these sqls to initialize the table or for testing purpose. Personally I think we do not need to do the parallel safety-check for these cases, because there seems no chance for the select part to consider parallel. I thought we aim to not check the safety unless parallel is possible. , So I was thinking is it possible to avoid the check it these cases ? I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query. If no we do not check the safety-check. I am not sure is it worth to do that, any thoughts ? Best regards, Houzj
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
On Tue, Jan 26, 2021, at 12:59, Mark Rofail wrote: > Please don't hesitate to give your feedback. The error message for insert or update violations looks fine: UPDATE catalog_clone.pg_extension SET extconfig = ARRAY[12345] WHERE oid = 10; ERROR: insert or update on table "pg_extension" violates foreign key constraint "pg_extension_extconfig_fkey" DETAIL: Key (EACH ELEMENT OF extconfig)=(12345) is not present in table "pg_class". But when trying to delete a still referenced row, the column mentioned in the "EACH ELEMENT OF" sentence is not the array column, but the column in the referenced table: DELETE FROM catalog_clone.pg_class WHERE oid = 10; ERROR: update or delete on table "pg_class" violates foreign key constraint "pg_extension_extconfig_fkey" on table "pg_extension" DETAIL: Key (EACH ELEMENT OF oid)=(10) is still referenced from table "pg_extension". I think either the "EACH ELEMENT OF" part of the latter error message should be dropped, or the column name for the array column should be used. /Joel
Re: Add SQL function for SHA1
On Tue, Jan 26, 2021 at 8:53 PM Michael Paquier wrote: > On Tue, Jan 26, 2021 at 10:38:43AM +0100, Daniel Gustafsson wrote: > > Agreed, and pgcrypto already allows for using sha1. > > > > It seems like any legitimate need for sha1 could be better served by an > > extension rather than supplying it in-core. > > Both of you telling the same thing is enough for me to discard this > new stuff. I'd like to refactor the code anyway as that's a nice > cleanup, and this would have the advantage to make people look at > cryptohashfuncs.c if introducing a new type. After sleeping about it, > I think that I would just make MD5 and SHA1 issue an elog(ERROR) if > the internal routine is taken in those cases, like in the attached. > The refactor patch looks good. It builds and passes make check. Thanks for the enum explanation too. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Re: archive status ".ready" files may be created too early
At Tue, 26 Jan 2021 19:13:57 +, "Bossart, Nathan" wrote in > On 12/17/20, 9:15 PM, "Kyotaro Horiguchi" wrote: > > At Thu, 17 Dec 2020 22:20:35 +, "Bossart, Nathan" > > wrote in > >> On 12/15/20, 2:33 AM, "Kyotaro Horiguchi" wrote: > >> > You're right in that regard. There's a window where partial record is > >> > written when write location passes F0 after insertion location passes > >> > F1. However, remembering all spanning records seems overkilling to me. > >> > >> I'm curious why you feel that recording all cross-segment records is > >> overkill. IMO it seems far simpler to just do that rather than try to > > > > Sorry, my words are not enough. Remembering all spanning records in > > *shared memory* seems to be overkilling. Much more if it is stored in > > shared hash table. Even though it rarely the case, it can fail hard > > way when reaching the limit. If we could do well by remembering just > > two locations, we wouldn't need to worry about such a limitation. > > I don't think it will fail if we reach max_size for the hash table. > The comment above ShmemInitHash() has this note: > > * max_size is the estimated maximum number of hashtable entries. This is > * not a hard limit, but the access efficiency will degrade if it is > * exceeded substantially (since it's used to compute directory size and > * the hash table buckets will get overfull). That description means that a shared hash has a directory with fixed size thus there may be synonyms, which causes degradation. Even though buckets are preallocated with the specified number, since the minimum directory size is 256, buckets are allocated at least 256 in a long run. Minimum on-the-fly allocation size is 32. I haven't calcuated further precicely, but I'm worried about the amount of spare shared memory the hash can allocate. > > Another concern about the concrete patch: > > > > NotifySegmentsReadyForArchive() searches the shared hashacquiaing a > > LWLock every time XLogWrite is called while segment archive is being > > held off. I don't think it is acceptable and I think it could be a > > problem when many backends are competing on WAL. > > This is a fair point. I did some benchmarking with a few hundred > connections all doing writes, and I was not able to discern any > noticeable performance impact. My guess is that contention on this > new lock is unlikely because callers of XLogWrite() must already hold > WALWriteLock. Otherwise, I believe we only acquire ArchNotifyLock no > more than once per segment to record new record boundaries. Thanks. I agree that the reader-reader contention is not a problem due to existing serialization by WALWriteLock. Adding an entry happens only at segment boundary so the ArchNotifyLock doesn't seem to be a problem. However the function prolongs the WALWriteLock section. Couldn't we somehow move the call to NotifySegmentsReadyForArchive in XLogWrite out of the WALWriteLock section? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: logical replication worker accesses catalogs in error context callback
For 0002, a few comments on the description: bq. Avoid accessing system catalogues inside conversion_error_callback catalogues -> catalog bq. error context callback, because the the entire transaction might There is redundant 'the' bq. Since get_attname() and get_rel_name() could search the sys cache by store the required information store -> storing Cheers On Tue, Jan 26, 2021 at 5:17 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Tue, Jan 26, 2021 at 11:29 AM Masahiko Sawada > wrote: > > > IMHO, adding an assertion in SearchCatCacheInternal(which is a most > > > generic code part within the server) with a few error context global > > > variables may not be always safe. Because we might miss using the > > > error context variables properly. Instead, we could add a comment in > > > ErrorContextCallback structure saying something like, "it's not > > > recommended to access any system catalogues within an error context > > > callback when the callback is expected to be called while processing > > > an error, because the transaction might have been broken in that > > > case." And let the future callback developers take care of it. > > > > > > Thoughts? > > > > That sounds good to me. But I still also see the value to add an > > assertion in SearchCatCacheInternal(). If we had it, we could find > > these two bugs earlier. > > > > Anyway, this seems to be unrelated to this bug fixing so we can start > > a new thread for that. > > +1 to start a new thread for that. > > > > As I said earlier [1], currently only two(there could be more) error > > > context callbacks access the sys catalogues, one is in > > > slot_store_error_callback which will be fixed with your patch. Another > > > is in conversion_error_callback, we can also fix this by storing the > > > relname, attname and other required info in ConversionLocation, > > > something like the attached. Please have a look. > > > > > > Thoughts? > > > > Thank you for the patch! > > > > Here are some comments: > > Thanks for the review comments. > > > +static void set_error_callback_info(ConversionLocation *errpos, > > + Relation rel, int cur_attno, > > + ForeignScanState *fsstate); > > > > I'm concerned a bit that the function name is generic. How about > > set_conversion_error_callback_info() or something to make the name > > clear? > > Done. > > > --- > > +static void > > +conversion_error_callback(void *arg) > > +{ > > + ConversionLocation *errpos = (ConversionLocation *) arg; > > + > > + if (errpos->show_generic_message) > > + errcontext("processing expression at position %d in > > select list", > > + errpos->cur_attno); > > + > > > > I think we can set this generic message to the error context when > > errpos->relname is NULL instead of using show_generic_message. > > Right. Done. > > > --- > > + /* > > +* Set error context callback info, so that we could avoid > accessing > > +* the system catalogues while processing the error in > > +* conversion_error_callback. System catalogue accesses are not > safe in > > +* error context callbacks because the transaction might have > been > > +* broken by then. > > +*/ > > + set_error_callback_info(, rel, i, fsstate); > > > > Looking at other code, we use "catalog" rather than "catalogue" in > > most places. Is it better to use "catalog" for consistency? FYI, the > > "catalogue" is used at only three places in the server code: > > Changed it to "catalog". > > > FYI I've added those bug fixes to the next Commitfest - > https://commitfest.postgresql.org/32/2955/ > > Thanks. I'm attaching 2 patches to make it easy for reviewing and also > they will get a chance to be tested by cf bot. > > 0001 - for avoiding system catalog access in slot_store_error_callback. > 0002 - for avoiding system catalog access in conversion_error_callback > > Please review it further. > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com >
Re: Add SQL function for SHA1
On Tue, Jan 26, 2021 at 10:38:43AM +0100, Daniel Gustafsson wrote: > Agreed, and pgcrypto already allows for using sha1. > > It seems like any legitimate need for sha1 could be better served by an > extension rather than supplying it in-core. Both of you telling the same thing is enough for me to discard this new stuff. I'd like to refactor the code anyway as that's a nice cleanup, and this would have the advantage to make people look at cryptohashfuncs.c if introducing a new type. After sleeping about it, I think that I would just make MD5 and SHA1 issue an elog(ERROR) if the internal routine is taken in those cases, like in the attached. If there are any comments or objections to the refactoring piece, please let me know. -- Michael diff --git a/src/backend/utils/adt/cryptohashfuncs.c b/src/backend/utils/adt/cryptohashfuncs.c index d99485f4c6..4bb684a08b 100644 --- a/src/backend/utils/adt/cryptohashfuncs.c +++ b/src/backend/utils/adt/cryptohashfuncs.c @@ -68,6 +68,60 @@ md5_bytea(PG_FUNCTION_ARGS) PG_RETURN_TEXT_P(cstring_to_text(hexsum)); } +/* + * Internal routine to compute a cryptohash with the given bytea input. + */ +static inline bytea * +cryptohash_internal(pg_cryptohash_type type, bytea *input) +{ + const uint8 *data; + const char *typestr = NULL; + int digest_len = 0; + size_t len; + pg_cryptohash_ctx *ctx; + bytea *result; + + switch (type) + { + case PG_SHA224: + typestr = "SHA224"; + digest_len = PG_SHA224_DIGEST_LENGTH; + break; + case PG_SHA256: + typestr = "SHA256"; + digest_len = PG_SHA256_DIGEST_LENGTH; + break; + case PG_SHA384: + typestr = "SHA384"; + digest_len = PG_SHA384_DIGEST_LENGTH; + break; + case PG_SHA512: + typestr = "SHA512"; + digest_len = PG_SHA512_DIGEST_LENGTH; + break; + case PG_MD5: + case PG_SHA1: + elog(ERROR, "unsupported digest type %d", type); + break; + } + + result = palloc0(digest_len + VARHDRSZ); + len = VARSIZE_ANY_EXHDR(input); + data = (unsigned char *) VARDATA_ANY(input); + + ctx = pg_cryptohash_create(type); + if (pg_cryptohash_init(ctx) < 0) + elog(ERROR, "could not initialize %s context", typestr); + if (pg_cryptohash_update(ctx, data, len) < 0) + elog(ERROR, "could not update %s context", typestr); + if (pg_cryptohash_final(ctx, (unsigned char *) VARDATA(result)) < 0) + elog(ERROR, "could not finalize %s context", typestr); + pg_cryptohash_free(ctx); + + SET_VARSIZE(result, digest_len + VARHDRSZ); + + return result; +} /* * SHA-2 variants @@ -76,28 +130,7 @@ md5_bytea(PG_FUNCTION_ARGS) Datum sha224_bytea(PG_FUNCTION_ARGS) { - bytea *in = PG_GETARG_BYTEA_PP(0); - const uint8 *data; - size_t len; - pg_cryptohash_ctx *ctx; - unsigned char buf[PG_SHA224_DIGEST_LENGTH]; - bytea *result; - - len = VARSIZE_ANY_EXHDR(in); - data = (unsigned char *) VARDATA_ANY(in); - - ctx = pg_cryptohash_create(PG_SHA224); - if (pg_cryptohash_init(ctx) < 0) - elog(ERROR, "could not initialize %s context", "SHA224"); - if (pg_cryptohash_update(ctx, data, len) < 0) - elog(ERROR, "could not update %s context", "SHA224"); - if (pg_cryptohash_final(ctx, buf) < 0) - elog(ERROR, "could not finalize %s context", "SHA224"); - pg_cryptohash_free(ctx); - - result = palloc(sizeof(buf) + VARHDRSZ); - SET_VARSIZE(result, sizeof(buf) + VARHDRSZ); - memcpy(VARDATA(result), buf, sizeof(buf)); + bytea *result = cryptohash_internal(PG_SHA224, PG_GETARG_BYTEA_PP(0)); PG_RETURN_BYTEA_P(result); } @@ -105,28 +138,7 @@ sha224_bytea(PG_FUNCTION_ARGS) Datum sha256_bytea(PG_FUNCTION_ARGS) { - bytea *in = PG_GETARG_BYTEA_PP(0); - const uint8 *data; - size_t len; - pg_cryptohash_ctx *ctx; - unsigned char buf[PG_SHA256_DIGEST_LENGTH]; - bytea *result; - - len = VARSIZE_ANY_EXHDR(in); - data = (unsigned char *) VARDATA_ANY(in); - - ctx = pg_cryptohash_create(PG_SHA256); - if (pg_cryptohash_init(ctx) < 0) - elog(ERROR, "could not initialize %s context", "SHA256"); - if (pg_cryptohash_update(ctx, data, len) < 0) - elog(ERROR, "could not update %s context", "SHA256"); - if (pg_cryptohash_final(ctx, buf) < 0) - elog(ERROR, "could not finalize %s context", "SHA256"); - pg_cryptohash_free(ctx); - - result = palloc(sizeof(buf) + VARHDRSZ); - SET_VARSIZE(result, sizeof(buf) + VARHDRSZ); - memcpy(VARDATA(result), buf, sizeof(buf)); + bytea *result = cryptohash_internal(PG_SHA256, PG_GETARG_BYTEA_PP(0)); PG_RETURN_BYTEA_P(result); } @@ -134,28 +146,7 @@ sha256_bytea(PG_FUNCTION_ARGS) Datum sha384_bytea(PG_FUNCTION_ARGS) { - bytea *in = PG_GETARG_BYTEA_PP(0); - const uint8 *data; - size_t len; - pg_cryptohash_ctx *ctx; - unsigned char buf[PG_SHA384_DIGEST_LENGTH]; - bytea *result; - - len = VARSIZE_ANY_EXHDR(in); - data = (unsigned char *) VARDATA_ANY(in); - - ctx = pg_cryptohash_create(PG_SHA384); - if (pg_cryptohash_init(ctx) < 0) - elog(ERROR, "could not initialize %s context", "SHA384"); - if (pg_cryptohash_update(ctx, data, len) < 0) - elog(ERROR, "could not update %s context", "SHA384"); - if
Re: wiki:PostgreSQL_14_Open_Items
On Tue, Jan 26, 2021 at 03:55:15PM -0600, Justin Pryzby wrote: > I've created a new page, and added some unresolved items that I've been > keeping > in my head. > > https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items Thanks, Justin! -- Michael signature.asc Description: PGP signature
Re: Transactions involving multiple postgres foreign servers, take 2
On 2021/01/18 14:54, Masahiko Sawada wrote: On Fri, Jan 15, 2021 at 7:45 AM Zhihong Yu wrote: For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch : + entry->changing_xact_state = true; ... + entry->changing_xact_state = abort_cleanup_failure; I don't see return statement in between the two assignments. I wonder why entry->changing_xact_state is set to true, and later being assigned again. Because postgresRollbackForeignTransaction() can get called again in case where an error occurred during aborting and cleanup the transaction. For example, if an error occurred when executing ABORT TRANSACTION (pgfdw_get_cleanup_result() could emit an ERROR), postgresRollbackForeignTransaction() will get called again while entry->changing_xact_state is still true. Then the entry will be caught by the following condition and cleaned up: /* * If connection is before starting transaction or is already unsalvageable, * do only the cleanup and don't touch it further. */ if (entry->changing_xact_state) { pgfdw_cleanup_after_transaction(entry); return; } For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch : bq. This commits introduces to new background processes: foreign commits introduces to new -> commit introduces two new Fixed. +FdwXactExistsXid(TransactionId xid) Since Xid is the parameter to this method, I think the Xid suffix can be dropped from the method name. But there is already a function named FdwXactExists()? bool FdwXactExists(Oid dbid, Oid serverid, Oid userid) As far as I read other code, we already have such functions that have the same functionality but have different arguments. For instance, SearchSysCacheExists() and SearchSysCacheExistsAttName(). So I think we can leave as it is but is it better to have like FdwXactCheckExistence() and FdwXactCheckExistenceByXid()? + * Portions Copyright (c) 2020, PostgreSQL Global Development Group Please correct year in the next patch set. Fixed. +FdwXactLauncherRequestToLaunch(void) Since the launcher's job is to 'launch', I think the Launcher can be omitted from the method name. Agreed. How about FdwXactRequestToLaunchResolver()? +/* Report shared memory space needed by FdwXactRsoverShmemInit */ +Size +FdwXactRslvShmemSize(void) Are both Rsover and Rslv referring to resolver ? It would be better to use whole word which reduces confusion. Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or FdwXactResolveShmemInit) Agreed. I realized that these functions are the launcher's function, not resolver's. So I'd change to FdwXactLauncherShmemSize() and FdwXactLauncherShmemInit() respectively. +fdwxact_launch_resolver(Oid dbid) The above method is not in camel case. It would be better if method names are consistent (in casing). Fixed. +errmsg("out of foreign transaction resolver slots"), +errhint("You might need to increase max_foreign_transaction_resolvers."))); It would be nice to include the value of max_foreign_xact_resolvers I agree it would be nice but looking at other code we don't include the value in this kind of messages. For fdwxact_resolver_onexit(): + LWLockAcquire(FdwXactLock, LW_EXCLUSIVE); + fdwxact->locking_backend = InvalidBackendId; + LWLockRelease(FdwXactLock); There is no call to method inside the for loop which may take time. I wonder if the lock can be obtained prior to the for loop and released coming out of the for loop. Agreed. +FXRslvLoop(void) Please use Resolver instead of Rslv Fixed. + FdwXactResolveFdwXacts(held_fdwxacts, nheld); Fdw and Xact are repeated twice each in the method name. Probably the method name can be made shorter. Fixed. You fixed some issues. But maybe you forgot to attach the latest patches? I'm reading 0001 and 0002 patches to pick up the changes for postgres_fdw that worth applying independent from 2PC feature. If there are such changes, IMO we can apply them in advance, and which would make the patches simpler. + if (PQresultStatus(res) != PGRES_COMMAND_OK) + ereport(ERROR, (errmsg("could not commit transaction on server %s", + frstate->server->servername))); You changed the code this way because you want to include the server name in the error message? I agree that it's helpful to report also the server name that caused an error. OTOH, since this change gets rid of call to pgfdw_rerport_error() for the returned PGresult, the reported error message contains less information. If this understanding is right, I don't think that this change is an improvement. Instead, if the server name should be included in the error message, pgfdw_report_error() should be changed so that it also reports the server name? If we do that, the server name is reported not only when COMMIT fails but also when
Re: logical replication worker accesses catalogs in error context callback
On Tue, Jan 26, 2021 at 11:29 AM Masahiko Sawada wrote: > > IMHO, adding an assertion in SearchCatCacheInternal(which is a most > > generic code part within the server) with a few error context global > > variables may not be always safe. Because we might miss using the > > error context variables properly. Instead, we could add a comment in > > ErrorContextCallback structure saying something like, "it's not > > recommended to access any system catalogues within an error context > > callback when the callback is expected to be called while processing > > an error, because the transaction might have been broken in that > > case." And let the future callback developers take care of it. > > > > Thoughts? > > That sounds good to me. But I still also see the value to add an > assertion in SearchCatCacheInternal(). If we had it, we could find > these two bugs earlier. > > Anyway, this seems to be unrelated to this bug fixing so we can start > a new thread for that. +1 to start a new thread for that. > > As I said earlier [1], currently only two(there could be more) error > > context callbacks access the sys catalogues, one is in > > slot_store_error_callback which will be fixed with your patch. Another > > is in conversion_error_callback, we can also fix this by storing the > > relname, attname and other required info in ConversionLocation, > > something like the attached. Please have a look. > > > > Thoughts? > > Thank you for the patch! > > Here are some comments: Thanks for the review comments. > +static void set_error_callback_info(ConversionLocation *errpos, > + Relation rel, int cur_attno, > + ForeignScanState *fsstate); > > I'm concerned a bit that the function name is generic. How about > set_conversion_error_callback_info() or something to make the name > clear? Done. > --- > +static void > +conversion_error_callback(void *arg) > +{ > + ConversionLocation *errpos = (ConversionLocation *) arg; > + > + if (errpos->show_generic_message) > + errcontext("processing expression at position %d in > select list", > + errpos->cur_attno); > + > > I think we can set this generic message to the error context when > errpos->relname is NULL instead of using show_generic_message. Right. Done. > --- > + /* > +* Set error context callback info, so that we could avoid accessing > +* the system catalogues while processing the error in > +* conversion_error_callback. System catalogue accesses are not safe > in > +* error context callbacks because the transaction might have been > +* broken by then. > +*/ > + set_error_callback_info(, rel, i, fsstate); > > Looking at other code, we use "catalog" rather than "catalogue" in > most places. Is it better to use "catalog" for consistency? FYI, the > "catalogue" is used at only three places in the server code: Changed it to "catalog". > FYI I've added those bug fixes to the next Commitfest - > https://commitfest.postgresql.org/32/2955/ Thanks. I'm attaching 2 patches to make it easy for reviewing and also they will get a chance to be tested by cf bot. 0001 - for avoiding system catalog access in slot_store_error_callback. 0002 - for avoiding system catalog access in conversion_error_callback Please review it further. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v3-0001-Avoid-Catalogue-Accesses-In-slot_store_error_call.patch Description: Binary data v3-0002-Avoid-Catalogue-Accesses-In-conversion_error_call.patch Description: Binary data
Re: Protect syscache from bloating with negative cache entries
At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi wrote in > The commit 4656e3d668 (debug_invalidate_system_caches_always) > conflicted with this patch. Rebased. At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi wrote in > (I found a bug in a benchmark-aid function > (CatalogCacheFlushCatalog2), I repost an updated version soon.) I noticed that a catcachebench-aid function CatalogCacheFlushCatalog2() allocates bucked array wrongly in the current memory context, which leads to a crash. This is a fixed it then rebased version. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5f318170b9c1e0caa1033862261800f06135e5bd Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 18 Nov 2020 16:54:31 +0900 Subject: [PATCH v7 1/3] CatCache expiration feature --- src/backend/access/transam/xact.c | 3 ++ src/backend/utils/cache/catcache.c | 87 +- src/backend/utils/misc/guc.c | 12 + src/include/utils/catcache.h | 19 +++ 4 files changed, 120 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index a2068e3fd4..86888d2409 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1086,6 +1086,9 @@ static void AtStart_Cache(void) { AcceptInvalidationMessages(); + + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); } /* diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index fa2b49c676..644d92dd9a 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -38,6 +38,7 @@ #include "utils/rel.h" #include "utils/resowner_private.h" #include "utils/syscache.h" +#include "utils/timestamp.h" /* #define CACHEDEBUG */ /* turns DEBUG elogs on */ @@ -60,9 +61,19 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = -1; +uint64 prune_min_age_us; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +uint64 catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -74,6 +85,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache, Index hashIndex, Datum v1, Datum v2, Datum v3, Datum v4); +static bool CatCacheCleanupOldEntries(CatCache *cp); static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4); @@ -99,6 +111,15 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); +/* GUC assign function */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + if (newval < 0) + prune_min_age_us = UINT64_MAX; + else + prune_min_age_us = ((uint64) newval) * USECS_PER_SEC; +} /* * internal support functions @@ -1264,6 +1285,9 @@ SearchCatCacheInternal(CatCache *cache, */ dlist_move_head(bucket, >cache_elem); + /* Record the last access timestamp */ + ct->lastaccess = catcacheclock; + /* * If it's a positive entry, bump its refcount and return it. If it's * negative, we can report failure to the caller. @@ -1425,6 +1449,61 @@ SearchCatCacheMiss(CatCache *cache, return >tuple; } +/* + * CatCacheCleanupOldEntries - Remove infrequently-used entries + * + * Catcache entries happen to be left unused for a long time for several + * reasons. Remove such entries to prevent catcache from bloating. It is based + * on the similar algorithm with buffer eviction. Entries that are accessed + * several times in a certain period live longer than those that have had less + * access in the same duration. + */ +static bool +CatCacheCleanupOldEntries(CatCache *cp) +{ + int nremoved = 0; + int i; + long oldest_ts = catcacheclock; + uint64 prune_threshold = catcacheclock - prune_min_age_us; + + /* Scan over the whole hash to find entries to remove */ + for (i = 0 ; i < cp->cc_nbuckets ; i++) + { + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, >cc_bucket[i]) + { + CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur); + + /* Don't remove referenced entries */ + if (ct->refcount == 0 && +(ct->c_list == NULL || ct->c_list->refcount == 0)) + { +if (ct->lastaccess < prune_threshold) +{ + CatCacheRemoveCTup(cp, ct); + nremoved++; + + /* don't let the removed entry update oldest_ts */ + continue; +} + } + + /* update the oldest timestamp if the entry remains alive */ + if (ct->lastaccess < oldest_ts) +oldest_ts = ct->lastaccess; + } + } + +
Re: Protect syscache from bloating with negative cache entries
At Tue, 26 Jan 2021 11:43:21 +0200, Heikki Linnakangas wrote in > Hi, > > On 19/11/2020 07:25, Kyotaro Horiguchi wrote: > > Performance measurement on the attached showed better result about > > searching but maybe worse for cache entry creation. Each time number > > is the mean of 10 runs. > > # Cacache (negative) entry creation > > : time(ms) (% to master) > > master : 3965.61(100.0) > > patched-off: 4040.93(101.9) > > patched-on : 4032.22(101.7) > > # Searching negative cache entries > > master : 8173.46(100.0) > > patched-off: 7983.43( 97.7) > > patched-on : 8049.88( 98.5) > > # Creation, searching and expiration > > master : 6393.23(100.0) > > patched-off: 6527.94(102.1) > > patched-on : 15880.01(248.4) > > That is, catcache searching gets faster by 2-3% but creation gets > > slower by about 2%. If I moved the condition of 2 further up to > > CatalogCacheCreateEntry(), that degradation reduced to 0.6%. > > # Cacache (negative) entry creation > > master : 3967.45 (100.0) > > patched-off : 3990.43 (100.6) > > patched-on : 4108.96 (103.6) > > # Searching negative cache entries > > master : 8106.53 (100.0) > > patched-off : 8036.61 ( 99.1) > > patched-on : 8058.18 ( 99.4) > > # Creation, searching and expiration > > master : 6395.00 (100.0) > > patched-off : 6416.57 (100.3) > > patched-on : 15830.91 (247.6) > > Can you share the exact script or steps to reproduce these numbers? I > presume these are from the catcachebench extension, but I can't figure > out which scenario above corresponds to which catcachebench > test. Also, catcachebench seems to depend on a bunch of tables being > created in schema called "test"; what tables did you use for the above > numbers? gen_tbl.pl to generate the tables, then run2.sh to run the benchmark. sumlog.pl to summarize the result of run2.sh. $ ./gen_tbl.pl | psql postgres $ ./run2.sh | tee rawresult.txt | ./sumlog.pl (I found a bug in a benchmark-aid function (CatalogCacheFlushCatalog2), I repost an updated version soon.) Simple explanation follows since the scripts are a kind of crappy.. run2.sh: LOOPS: # of execution of catcachebench() in a single run USES : Take the average of this number of fastest executions in a single run. BINROOT : Common parent directory of target binaries. DATADIR : Data directory. (shared by all binaries) PREC : FP format for time and percentage in a result. TESTS: comma-separated numbers given to catcachebench. The "run" function spec run "binary-label" where A, B and C are the value for catalog_cache_prune_min_age. "" means no setting (used for master binary). Currently only C is in effect but all the three should be non-empty string to make it effective. The result output is: test | version | n |r | stddev --+-+-+--+- 1 | patched-off | 1/3 | 14211.96 | 261.19 test : # of catcachebench(#) version: binary label given to the run function n : USES / LOOPS r : result average time of catcachebench() in milliseconds stddev : stddev of regards. -- Kyotaro Horiguchi NTT Open Source Software Center #! /usr/bin/perl $collist = ""; foreach $i (0..1000) { $collist .= sprintf(", c%05d int", $i); } $collist = substr($collist, 2); printf "drop schema if exists test cascade;\n"; printf "create schema test;\n"; foreach $i (0..2999) { printf "create table test.t%04d ($collist);\n", $i; } #!/bin/bash LOOPS=3 USES=1 TESTS=1,2,3 BINROOT=/home/horiguti/bin DATADIR=/home/horiguti/data/data_work PREC="numeric(10,2)" /usr/bin/killall postgres /usr/bin/sleep 3 run() { local BINARY=$1 local PGCTL=$2/bin/pg_ctl local PGSQL=$2/bin/postgres local PSQL=$2/bin/psql if [ "$3" != "" ]; then local SETTING1="set catalog_cache_prune_min_age to \"$3\";" local SETTING2="set catalog_cache_prune_min_age to \"$4\";" local SETTING3="set catalog_cache_prune_min_age to \"$5\";" fi # ($PGSQL -D $DATADIR 2>&1 > /dev/null)& ($PGSQL -D $DATADIR 2>&1 > /dev/null | /usr/bin/sed -e 's/^/# /')& /usr/bin/sleep 3 ${PSQL} postgres <&1 > /dev/null | /usr/bin/sed -e 's/^/# /' # oreport > $BINARY_perf.txt } for i in $(seq 0 2); do run "patched-off" $BINROOT/pgsql_catexp "-1" "-1" "-1" run "patched-on" $BINROOT/pgsql_catexp "0" "0" "0" run "master" $BINROOT/pgsql_master_o2 "" "" "" done #! /usr/bin/perl while () { # if (/^\s+([0-9])\s*\|\s*(\w+)\s*\|\s*(\S+)\s*\|\s*([\d.]+)\s*\|\s*(\w+)\s*$/) { if (/^\s+([0-9])\s*\|\s*(\S+)\s*\|\s*(\S+)\s*\|\s*([\d.]+)\s*\|\s*([\d.]+)\s*$/) { $test = $1; $bin = $2; $time = $4; if (defined $sum{$test}{$bin}) { $sum{$test}{$bin} +=
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Tue, Jan 26, 2021 at 8:38 AM Bharath Rupireddy wrote: > I will post "keep_connections" GUC and "keep_connection" server level > option patches later. Attaching v19 patch set for "keep_connections" GUC and "keep_connection" server level option. Please review them further. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v19-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch Description: Binary data v19-0003-postgres_fdw-server-level-option-keep_connection.patch Description: Binary data
Re: Single transaction in the tablesync worker?
On Mon, Jan 25, 2021 at 4:48 PM Amit Kapila wrote: > > On Mon, Jan 25, 2021 at 8:03 AM Peter Smith wrote: > > > > Hi Amit. > > > > PSA the v19 patch for the Tablesync Solution1. > > > > I see one race condition in this patch where we try to drop the origin > via apply process and DropSubscription. I think it can lead to the > error "cache lookup failed for replication origin with oid %u". The > same problem can happen via exposed API pg_replication_origin_drop but > probably because this is not used concurrently so nobody faced this > issue. I think for the matter of this patch we can try to suppress > such an error either via try..catch, or by adding missing_ok argument > to replorigin_drop API, or we can just add to comments that such a > race exists. Additionally, we should try to start a new thread for the > existence of this problem in pg_replication_origin_drop. What do you > think? OK. A new thread [ps0127] for this problem was started --- [ps0127] = https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: simplifying foreign key/RI checks
Hi Amit-san, On 2021/01/25 18:19, Amit Langote wrote: On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker wrote: On Sun, Jan 24, 2021 at 6:51 AM Amit Langote wrote: Here's v5. v5 patches apply to master. Suggested If/then optimization is implemented. Suggested case merging is implemented. Passes make check and make check-world yet again. Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct? Right. Anybody else want to look this patch over before I mark it Ready For Committer? Would be nice to have others look it over. Thanks. Thanks for creating the patch! I tried to review the patch. Here is my comment. * According to this thread [1], it might be better to replace elog() with ereport() in the patch. [1]: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com Thanks, Tatsuro Yamada
pg_replication_origin_drop API potential race condition
Hi Hackers. As discovered elsewhere [ak0125] there is a potential race condition in the pg_replication_origin_drop API The current code of pg_replication_origin_drop looks like: roident = replorigin_by_name(name, false); Assert(OidIsValid(roident)); replorigin_drop(roident, true); Users cannot deliberately drop a non-existent origin (replorigin_by_name passes missing_ok = false) but there is still a small window where concurrent processes may be able to call replorigin_drop for the same valid roident. Locking within replorigin_drop guards against concurrent drops so the 1st execution will succeed, but then the 2nd execution would give internal cache error: elog(ERROR, "cache lookup failed for replication origin with oid %u", roident); Some ideas to fix this include: 1. Do nothing except write a comment about this in the code. The internal ERROR is not ideal for a user API there is no great harm done. 2. Change the behavior of replorigin_drop to be like replorigin_drop_IF_EXISTS, so the 2nd execution of this race would silently do nothing when it finds the roident is already gone. 3. Same as 2, but make the NOP behavior more explicit by introducing a new "missing_ok" parameter for replorigin_drop. Thoughts? [ak0125] https://www.postgresql.org/message-id/CAA4eK1%2ByeLwBCkTvTdPM-hSk1fr6jT8KJc362CN8zrGztq_JqQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: WIP: BRIN multi-range indexes
On 1/26/21 7:52 PM, John Naylor wrote: On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra mailto:tomas.von...@enterprisedb.com>> wrote: > > > On 1/23/21 12:27 AM, John Naylor wrote: > > Still, it would be great if multi-minmax can be a drop in replacement. I > > know there was a sticking point of a distance function not being > > available on all types, but I wonder if that can be remedied or worked > > around somehow. > > > > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in > replacement for minmax (essentially, using these opclasses as the > default ones, with the option to switch back to plain minmax). I'm not > convinced we should do that - though. Imagine you have minmax indexes in > your existing DB, it's working perfectly fine, and then we come and just > silently change that during dump/restore. Is there some past example > when we did something similar and it turned it to be OK? I was assuming pg_dump can be taught to insert explicit opclasses for minmax indexes, so that upgrade would not cause surprises. If that's true, only new indexes would have the different default opclass. Maybe, I suppose we could do that. But I always found such changes happening silently in the background a bit suspicious, because it may be quite confusing. I certainly wouldn't expect such difference between creating a new index and index created by dump/restore. Did we do such changes in the past? That might be a precedent, but I don't recall any example ... > As for the distance functions, I'm pretty sure there are data types > without "natural" distance - like most strings, for example. We could > probably invent something, but the question is how much we can rely on > it working well enough in practice. > > Of course, is minmax even the right index type for such data types? > Strings are usually "labels" and not queried using range queries, > although sometimes people encode stuff as strings (but then it's very > unlikely we'll define the distance definition well). So maybe for those > types a hash / bloom would be a better fit anyway. Right. > But I do have an idea - maybe we can do without distances, in those > cases. Essentially, the primary issue of minmax indexes are outliers, so > what if we simply sort the values, keep one range in the middle and as > many single points on each tail? That's an interesting idea. I think it would be a nice bonus to try to do something along these lines. On the other hand, I'm not the one volunteering to do the work, and the patch is useful as is. IMO it's fairly small amount of code, so I'll take a stab at in in the next version of the patch. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: mkid reference
On Tue, Jan 26, 2021 at 10:19:44PM +0100, Magnus Hagander wrote: > On Mon, Jan 25, 2021 at 4:40 PM Bruce Momjian wrote: > > > > On Sun, Jan 24, 2021 at 02:20:58PM +0100, Magnus Hagander wrote: > > > While at it, what point is "codelines" adding? > > > > That is the script I use to generate code line counts when comparing > > releases. I thought it should be in the tree so others can reproduce my > > numbers. > > Not that it particularly matters to keep it, but wouldn't something > like cloc give a much better number? Yes, we could, but we didn't really have any criteria on exactly what to count, so I just counted physical lines. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Online checksums patch - once again
On 22/01/2021 14:21, Heikki Linnakangas wrote: On 22/01/2021 13:55, Heikki Linnakangas wrote: I read through the latest patch, v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Some comments below: One more thing: In SetRelationNumChecks(), you should use SearchSysCacheCopy1() to get a modifiable copy of the tuple. Otherwise you modify the tuple in the relcache as a side effect. Maybe that's harmless in this case, as the 'relhaschecksums' value in the relcache isn't used for anything, but let's be tidy. Sorry, I meant SetRelHasChecksums. There is no SetRelationNumChecks function, I don't know where I got that from. - Heikki
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2021-01-26 09:58, Michael Paquier wrote: On Mon, Jan 25, 2021 at 11:11:38PM +0300, Alexey Kondratov wrote: I updated comment with CCI info, did pgindent run and renamed new function to SetRelationTableSpace(). New patch is attached. [...] Yeah, all these checks we complicated from the beginning. I will try to find a better place tomorrow or put more info into the comments at least. I was reviewing that, and I think that we can do a better consolidation on several points that will also help the features discussed on this thread for VACUUM, CLUSTER and REINDEX. If you look closely, ATExecSetTableSpace() uses the same logic as the code modified here to check if a relation can be moved to a new tablespace, with extra checks for mapped relations, GLOBALTABLESPACE_OID or if attempting to manipulate a temp relation from another session. There are two differences though: - Custom actions are taken between the phase where we check if a relation can be moved to a new tablespace, and the update of pg_class. - ATExecSetTableSpace() needs to be able to set a given relation relfilenode on top of reltablespace, the newly-created one. So I think that the heart of the problem is made of two things here: - We should have one common routine for the existing code paths and the new code paths able to check if a tablespace move can be done or not. The case of a cluster, reindex or vacuum on a list of relations extracted from pg_class would still require a different handling as incorrect relations have to be skipped, but the case of individual relations can reuse the refactoring pieces done here (see CheckRelationTableSpaceMove() in the attached). - We need to have a second routine able to update reltablespace and optionally relfilenode for a given relation's pg_class entry, once the caller has made sure that CheckRelationTableSpaceMove() validates a tablespace move. I think that I got your idea. One comment: +bool +CheckRelationTableSpaceMove(Relation rel, Oid newTableSpaceId) +{ + Oid oldTableSpaceId; + Oid reloid = RelationGetRelid(rel); + + /* +* No work if no change in tablespace. Note that MyDatabaseTableSpace +* is stored as 0. +*/ + oldTableSpaceId = rel->rd_rel->reltablespace; + if (newTableSpaceId == oldTableSpaceId || + (newTableSpaceId == MyDatabaseTableSpace && oldTableSpaceId == 0)) + { + InvokeObjectPostAlterHook(RelationRelationId, reloid, 0); + return false; + } CheckRelationTableSpaceMove() does not feel like a right place for invoking post alter hooks. It is intended only to check for tablespace change possibility. Anyway, ATExecSetTableSpace() and ATExecSetTableSpaceNoStorage() already do that in the no-op case. Please note that was a bug in your previous patch 0002: shared dependencies need to be registered if reltablespace is updated of course, but also iff the relation has no physical storage. So changeDependencyOnTablespace() requires a check based on RELKIND_HAS_STORAGE(), or REINDEX would have registered shared dependencies even for relations with storage, something we don't want per the recent work done by Alvaro in ebfe2db. Yes, thanks. I have removed this InvokeObjectPostAlterHook() from your 0001 and made 0002 to work on top of it. I think that now it should look closer to what you described above. In the new 0002 I moved ACL check to the upper level, i.e. ExecReindex(), and removed expensive text generation in test. Not touched yet some of your previously raised concerns. Also, you made SetRelationTableSpace() to accept Relation instead of Oid, so now we have to open/close indexes in the ReindexPartitions(), I am not sure that I use proper locking there, but it works. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres CompanyFrom 96a37399a9cf9ae08d62e28496e73b36087e5a19 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 26 Jan 2021 15:53:06 +0900 Subject: [PATCH v7 1/2] Refactor code to detect and process tablespace moves --- src/backend/commands/tablecmds.c | 218 +-- src/include/commands/tablecmds.h | 4 + 2 files changed, 127 insertions(+), 95 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8687e9a97c..c08eedf995 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3037,6 +3037,116 @@ SetRelationHasSubclass(Oid relationId, bool relhassubclass) table_close(relationRelation, RowExclusiveLock); } +/* + * CheckRelationTableSpaceMove + * Check if relation can be moved to new tablespace. + * + * NOTE: caller must be holding an appropriate lock on the relation. + * ShareUpdateExclusiveLock is sufficient to prevent concurrent schema + * changes. + * + * Returns true if the relation can be moved to the new tablespace; + * false
wiki:PostgreSQL_14_Open_Items
I've created a new page, and added some unresolved items that I've been keeping in my head. https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items
Re: strange error reporting
Alvaro Herrera writes: > pgbench has one occurrence of the old pattern in master, in line 6043. > However, since doConnect() returns NULL when it gets CONNECTION_BAD, > that seems dead code. This patch kills it. Oh ... I missed that because it wasn't adjacent to the PQconnectdbParams call :-(. You're right, that's dead code and we should just delete it. regards, tom lane
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Justin Pryzby writes: > I have nothing new to add, but wanted to point out this is still an issue. > This is on the v13 Opened Items list - for lack of anywhere else to put them, > I > also added two other, unresolved issues. It's probably time to make a v14 open items page, and move anything you want to treat as a live issue to there. regards, tom lane
Re: mkid reference
On Mon, Jan 25, 2021 at 4:40 PM Bruce Momjian wrote: > > On Sun, Jan 24, 2021 at 02:20:58PM +0100, Magnus Hagander wrote: > > While at it, what point is "codelines" adding? > > That is the script I use to generate code line counts when comparing > releases. I thought it should be in the tree so others can reproduce my > numbers. Not that it particularly matters to keep it, but wouldn't something like cloc give a much better number? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: mkid reference
On Tue, Jan 26, 2021 at 6:58 PM Bruce Momjian wrote: > > On Tue, Jan 26, 2021 at 05:03:30PM +0100, Magnus Hagander wrote: > > On Mon, Jan 25, 2021 at 4:38 PM Bruce Momjian wrote: > > > > > > On Fri, Jan 22, 2021 at 01:07:36PM -0500, Tom Lane wrote: > > > > > There's also src/tools/make_mkid which use this mkid tool. +1 for > > > > > removing. > > > > > If anything, it seems better replaced by extended documentation on > > > > > the existing > > > > > wiki article [0] on how to use "git format-patch". > > > > > > > > I found man pages for mkid online --- it's apparently a ctags-like > > > > code indexing tool, not something for patches. So maybe Bruce still > > > > uses it, or maybe not. But as long as we've also got make_ctags and > > > > > > Yes, I do still use it, so I thought having a script to generate its > > > index files might be helpful to someone. > > > > Where do you actually get it? The old docs (now removed) suggested > > getting it off ftp.postgresql.org... > > Not sure why it was on our ftp site, since it is a GNU download: > > https://www.gnu.org/software/idutils/ Ah, good. Then at least we now have it in the list archives for reference if somebody else searches for it :) And no, it wasn't actually on our ftp server. But it might have been at some point far far in the past... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: shared tempfile was not removed on statement_timeout
On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi wrote: > At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas > wrote in > > Don't we potentially have the same problem with all on_dsm_detach > > callbacks? Looking at the other on_dsm_detach callbacks, I don't see > > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to > > assume that. > > > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least > > around the removal of the callback from the list and calling the > > callback. Maybe even over the whole function. > > Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same > location. +1, this seems like a good idea. This is a little bit like the code near the comments "Don't joggle the elbow of proc_exit".
Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
I have nothing new to add, but wanted to point out this is still an issue. This is on the v13 Opened Items list - for lack of anywhere else to put them, I also added two other, unresolved issues. https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items=revision=35624=35352 On Tue, Aug 04, 2020 at 06:00:34PM -0400, Tom Lane wrote: > Peter Geoghegan writes: > > On Tue, Aug 4, 2020 at 1:31 PM Tom Lane wrote: > >> while the INSERT_IN_PROGRESS case has none. Simple symmetry > >> would suggest that the INSERT_IN_PROGRESS case should be > >> > >>if (checking_uniqueness || > >>HeapTupleIsHeapOnly(heapTuple)) > >>// wait > > > I had exactly the same intuition. > > >> but I'm not 100% convinced that that's right. > > > Why doubt that explanation? > > First, it's not clear that this is an exact inverse, because > HeapTupleIsHotUpdated does more than check the HOT_UPDATED flag. > Second, I think there remains some doubt as to whether the > DELETE_IN_PROGRESS case is right either. If we were forcing > a wait for *every* in-doubt HOT-chain element, not only non-last > ones (or non-first ones for the INSERT case, if we use the above > fix) then it'd be quite clear that we're safe. But if we want > to keep the optimization then I think maybe closer analysis is > warranted.
Re: Key management with tests
On Tue, Jan 26, 2021 at 11:15 AM Bruce Momjian wrote: > This version fixes OpenSSL detection and improves docs for initdb > interactions. Hi, I'm wondering whether you've considered storing all the keys in one file instead of a file per key. The reason I ask is that it seems to me that the key rotation procedure would be a lot simpler if it were all in one file. You could just create a temporary file and atomically rename it over the existing file. If you see a temporary file you're always free to remove it. This is a lot simpler than what you have right now. The "repair" concept pretty much goes away completely, which seems nice. Granted I don't know exactly how to store multiple keys in one file, but I bet there's some way to do it. The way in which you are posting these patches is quite unlike what most people do when posting patches to this list. You seem to have generated a bunch of patches using 'git format-patch' but then concatenated them all together in a single file. It would be helpful if you could do this more like the way that is now standard on this list. Not only that, but the patches don't have meaningful commit messages in them, and don't seem to be meaningfully split for easy review. They just say things like 'crypto squash commit'. Compare this to for example what I did on the "cleaning up a few CLOG-related things" thread where the commits appear in a logical sequence, and each one has a meaningful commit message. Or here's an example from someone else -- http://postgr.es/m/be72abfa-e62e-eb81-4e70-1b57fe6dc...@amazon.com -- and note the inclusion of authorship information in the commit messages, so that the source of the code can be easily understood. The README in src/backend/crypto does not explain how the scripts in that directory are intended to be used. If I want to use AWS Secrets Manager with this feature, I can see that I should use ckey_aws.sh.sample as a basis for that integration, but I don't know what I should do with the script because the README says nothing about it. I am frankly pretty doubtful about the idea of shipping a bunch of /bin/sh scripts as a best practice; for one thing, that's totally unusable on Windows, and it also means that this is dependent on /bin/sh existing and having the behavior we expect and on all the other executables in these scripts as well. But, at the very least, there needs to be a clearer explanation of how the scripts are intended to be used, which parts people are supposed to modify, what arguments they're going to get called with, and things like that. The comments in cipher.c and cipher_openssl.c could be improved to explain that they are alternatives to each other. Perhaps the former could be renamed to something like cipher_failure.c or cipher_noimpl.c for clarity. I believe that a StaticAssertStmt could be used to check the length of the encryption_methods[] array, so that if someone changes NUM_ENCRYPTION_METHODS without updating the array, compilation fails. See UserAuthName[] for an example of how to do this. You seem to have omitted to update the documentation with the names of the new wait events that you added. In process_postgres_switches(), when there's a multi-line comment followed by a single line of actual code, I prefer to include braces around the whole thing. There might be some disagreement on what is best here, though. What are the consequences of the placement of the code in PostgresMain() for processes other than user backends and walsenders? I think that the way you have it, background workers would not have access to keys, nor auxiliary processes like the checkpointer ... at least in the EXEC_BACKEND case. In the non-EXEC_BACKEND case you have the postmaster doing it, so then I'm not sure why it has to be redone for every backend. Won't they just inherit the data from the postmaster? Has this code been meaningfully tested on Windows? How do we know that it works? Maybe we need to think about adding some asserts that guarantee that any process that attempts to access a buffer has the key manager initialized; I bet such assertions would fail at least on Windows as the code looks now. I don't think it makes sense to think about committing this to v14. I believe it only makes sense if we have a TDE patch that is relatively close to committable that can be used with it. I also don't think that this patch is in good enough shape to commit yet in terms of where it's at in terms of quality; I think it needs more review first, hopefully including review from people who can comment intelligently specifically on the cryptography aspects of it. However, the challenges don't seem insurmountable. There's also still some question in my mind about whether the design choices here (one KEK, 2 DEKs, one for data and one for WAL) have enough consensus. I don't have a considered opinion on that, partly because I'm not quite sure what the reasonable alternatives are, but it seems that other people had some questions about
Re: strange error reporting
On 2021-Jan-20, Robert Haas wrote: > On Wed, Jan 20, 2021 at 1:54 PM Tom Lane wrote: > > Alvaro Herrera writes: > > > Well, the patch seems small enough, and I don't think it'll be in any > > > way helpful to omit that detail. > > > > I'm +1 for applying and back-patching that. I still think we might > > want to just drop the phrase altogether in HEAD, but we wouldn't do > > that in the back branches, and the message is surely misleading as-is. > > Sure, that makes sense. OK, I pushed it. Thanks, pgbench has one occurrence of the old pattern in master, in line 6043. However, since doConnect() returns NULL when it gets CONNECTION_BAD, that seems dead code. This patch kills it. -- Álvaro Herrera39°49'30"S 73°17'W "I can see support will not be a problem. 10 out of 10."(Simon Wittber) (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 1be1ad3d6d..a4a3f40048 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -6038,17 +6038,10 @@ main(int argc, char **argv) /* opening connection... */ con = doConnect(); if (con == NULL) exit(1); - if (PQstatus(con) == CONNECTION_BAD) - { - pg_log_fatal("connection to database \"%s\" failed: %s", - dbName, PQerrorMessage(con)); - exit(1); - } - if (internal_script_used) GetTableInfo(con, scale_given); /* * :scale variables normally get -s or database scale, but don't override
Re: making update/delete of inheritance trees scale better
On Fri, Oct 30, 2020 at 6:26 PM Heikki Linnakangas wrote: > Yeah, you need to access the old tuple to update its t_ctid, but > accessing it twice is still more expensive than accessing it once. Maybe > you could optimize it somewhat by keeping the buffer pinned or > something. Or push the responsibility down to the table AM, passing the > AM only the modified columns, and let the AM figure out how to deal with > the columns that were not modified, hoping that it can do something smart. Just as a point of possible interest, back when I was working on zheap, I sort of wanted to take this in the opposite direction. In effect, a zheap tuple has system columns that don't exist for a heap tuple, and you can't do an update or delete without knowing what the values for those columns are, so zheap had to just refetch the tuple, but that sucked in comparisons with the existing heap, which didn't have to do the refetch. At the time, I thought maybe the right idea would be to extend things so that a table AM could specify an arbitrary set of system columns that needed to be bubbled up to the point where the update or delete happens, but that seemed really complicated to implement and I never tried. Here it seems like we're thinking of going the other way, and just always doing the refetch. That is of course fine for zheap comparative benchmarks: instead of making zheap faster, we just make the heap slower! Well, sort of. I didn't think about the benefits of the refetch approach when the tuples are wide. That does cast a somewhat different light on things. I suppose we could have both methods and choose the one that seems likely to be faster in particular cases, but that seems like way too much machinery. Maybe there's some way to further optimize accessing the same tuple multiple times in rapid succession to claw back some of the lost performance in the slow cases, but I don't have a specific idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: archive status ".ready" files may be created too early
On 1/2/21, 8:55 AM, "Andrey Borodin" wrote: > Recently we had somewhat related incident. Do I understand correctly that > this incident is related to the bug discussed in this thread? I'm not sure that we can rule it out, but the log pattern I've typically seen for this is "invalid contrecord length." The issue is that we're marking segments as ready for archive when the segment is fully written versus when its WAL records are fully written (since its WAL records may cross into the next segment). The fact that you're seeing zeroes at the end of your archived segments leads me to think it is unlikely that you are experiencing this bug. Nathan
Re: archive status ".ready" files may be created too early
On 12/17/20, 9:15 PM, "Kyotaro Horiguchi" wrote: > At Thu, 17 Dec 2020 22:20:35 +, "Bossart, Nathan" > wrote in >> On 12/15/20, 2:33 AM, "Kyotaro Horiguchi" wrote: >> > You're right in that regard. There's a window where partial record is >> > written when write location passes F0 after insertion location passes >> > F1. However, remembering all spanning records seems overkilling to me. >> >> I'm curious why you feel that recording all cross-segment records is >> overkill. IMO it seems far simpler to just do that rather than try to > > Sorry, my words are not enough. Remembering all spanning records in > *shared memory* seems to be overkilling. Much more if it is stored in > shared hash table. Even though it rarely the case, it can fail hard > way when reaching the limit. If we could do well by remembering just > two locations, we wouldn't need to worry about such a limitation. I don't think it will fail if we reach max_size for the hash table. The comment above ShmemInitHash() has this note: * max_size is the estimated maximum number of hashtable entries. This is * not a hard limit, but the access efficiency will degrade if it is * exceeded substantially (since it's used to compute directory size and * the hash table buckets will get overfull). > Another concern about the concrete patch: > > NotifySegmentsReadyForArchive() searches the shared hashacquiaing a > LWLock every time XLogWrite is called while segment archive is being > held off. I don't think it is acceptable and I think it could be a > problem when many backends are competing on WAL. This is a fair point. I did some benchmarking with a few hundred connections all doing writes, and I was not able to discern any noticeable performance impact. My guess is that contention on this new lock is unlikely because callers of XLogWrite() must already hold WALWriteLock. Otherwise, I believe we only acquire ArchNotifyLock no more than once per segment to record new record boundaries. Nathan
Re: WIP: BRIN multi-range indexes
On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra wrote: > > > On 1/23/21 12:27 AM, John Naylor wrote: > > Still, it would be great if multi-minmax can be a drop in replacement. I > > know there was a sticking point of a distance function not being > > available on all types, but I wonder if that can be remedied or worked > > around somehow. > > > > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in > replacement for minmax (essentially, using these opclasses as the > default ones, with the option to switch back to plain minmax). I'm not > convinced we should do that - though. Imagine you have minmax indexes in > your existing DB, it's working perfectly fine, and then we come and just > silently change that during dump/restore. Is there some past example > when we did something similar and it turned it to be OK? I was assuming pg_dump can be taught to insert explicit opclasses for minmax indexes, so that upgrade would not cause surprises. If that's true, only new indexes would have the different default opclass. > As for the distance functions, I'm pretty sure there are data types > without "natural" distance - like most strings, for example. We could > probably invent something, but the question is how much we can rely on > it working well enough in practice. > > Of course, is minmax even the right index type for such data types? > Strings are usually "labels" and not queried using range queries, > although sometimes people encode stuff as strings (but then it's very > unlikely we'll define the distance definition well). So maybe for those > types a hash / bloom would be a better fit anyway. Right. > But I do have an idea - maybe we can do without distances, in those > cases. Essentially, the primary issue of minmax indexes are outliers, so > what if we simply sort the values, keep one range in the middle and as > many single points on each tail? That's an interesting idea. I think it would be a nice bonus to try to do something along these lines. On the other hand, I'm not the one volunteering to do the work, and the patch is useful as is. -- John Naylor EDB: http://www.enterprisedb.com
Re: Allow matching whole DN from a client certificate
On Tue, 2021-01-26 at 13:49 +0100, Daniel Gustafsson wrote: > Reading more on this it seems we would essentially have to go with RFC 4514, > as > it's the closest to a standard which seems to exist. Having done a lot > research on this topic, do you have a gut feeling on direction? Yeah, if we're going to use string matching then I agree with the use of RFC 4514. I think a string-based approach is going to work for only the most simple cases, though. Anything more complicated than plain ASCII and a fixed base DN is going to become pretty unmanageable with a regex; you'd need a filter system of some sort that understands DN structure. The documentation should be clear on the limitations. > The OpenSSL X509_NAME_cmp function use RFC 5280 section 7.1 and RFC 4517 > section 4.2.15 (which in turn reference RFC 4514 for the DN string format). > libnss has CERT_AsciiToName which is referencing RFCs 1485, 1779 and 2253 in > lib/certdb/alg1485.c. Comparing the two would be interesting. Yeah. I'll poke around a bit. > Right, multi-value RDN's are defined as sets so C=US,A=B+C=D is equivalent to > C=US,C=D+A=B according to RFC 5280. Other potential differences, by my understanding of that RFC, include the quoting of the "escaped" character class (e.g. a comma can be escaped as either "\," or "\2C"), the case of hex characters ("\FF" vs "\ff"), and the option to not quote unprintable control characters (e.g. ASCII DEL, 0x7F, can be included verbatim or quoted as "\7F"). --Jacob
Re: WIP: BRIN multi-range indexes
Hi Tomas, I took another look through the Bloom opclass portion (0004) with sorted_mode omitted, and it looks good to me code-wise. I think this part is close to commit-ready. I also did one more proofreading pass for minor details. + rows per block). The default values is -0.1, and + greater than 0.0 and smaller than 1.0. The default values is 0.01, s/values/value/ + * bloom filter, we can easily and cheaply test wheter it contains values s/wheter/whether/ + * XXX We assume the bloom filters have the same parameters fow now. In the s/fow/for/ + * or null if it does not exists. s/exists/exist/ + * We do expect the bloom filter to eventually switch to hashing mode, + * and it's bound to be almost perfectly random, so not compressible. Leftover from when it started out in sorted mode. + if ((m/8) > BLCKSZ) It seems we need something more safe, to account for page header and tuple header at least. As the comment before says, the filter will eventually not be compressible. I remember we can't be exact here, with the possibility of multiple columns, but we can leave a little slack space. -- John Naylor EDB: http://www.enterprisedb.com
Re: Deleting older versions in unique indexes to avoid page splits
On Mon, Jan 25, 2021 at 10:48 PM Amit Kapila wrote: > I need to spend more time on benchmarking to study the behavior and I > think without that it would be difficult to make a conclusion in this > regard. So, let's not consider any action on this front till I spend > more time to find the details. It is true that I committed the patch without thorough review, which was less than ideal. I welcome additional review from you now. I will say one more thing about it for now: Start with a workload, not with the code. Without bottom-up deletion (e.g. when using Postgres 13) with a simple though extreme workload that will experience version churn in indexes after a while, it still takes quite a few minutes for the first page to split (when the table is at least a few GB in size to begin with). When I was testing the patch I would notice that it could take 10 or 15 minutes for the deletion mechanism to kick in for the first time -- the patch really didn't do anything at all until perhaps 15 minutes into the benchmark, despite helping *enormously* by the 60 minute mark. And this is with significant skew, so presumably the first page that would split (in the absence of the bottom-up deletion feature) was approximately the page with the most skew -- most individual pages might have taken 30 minutes or more to split without the intervention of bottom-up deletion. Relatively rare events (in this case would-be page splits) can have very significant long term consequences for the sustainability of a workload, so relatively simple targeted interventions can make all the difference. The idea behind bottom-up deletion is to allow the workload to figure out the best way of fixing its bloat problems *naturally*. The heuristics must be simple precisely because workloads are so varied and complicated. We must be willing to pay small fixed costs for negative feedback -- it has to be okay for the mechanism to occasionally fail in order to learn what works. I freely admit that I don't understand all workloads. But I don't think anybody can. This holistic/organic approach has a lot of advantages, especially given the general uncertainty about workload characteristics. Your suspicion of the simple nature of the heuristics actually makes a lot of sense to me. I do get it. -- Peter Geoghegan
Re: Error on failed COMMIT
On Tue, 26 Jan 2021 at 12:46, Laurenz Albe wrote: > On Tue, 2021-01-26 at 12:25 -0500, Dave Cramer wrote: > > > After thinking some more about it, I think that COMMIT AND CHAIN would > have > > > to change behavior: if COMMIT throws an error (because the transaction > was > > > aborted), no new transaction should be started. Everything else seems > fishy: > > > the statement fails, but still starts a new transaction? > > > > > > I guess that's also at fault for the unexpected result status that > > > Masahiko complained about in the other message. > > > > > > I haven't had a look at the result status in libpq. For JDBC we don't > see that. > > We throw an exception when we get this error report. This is very > consistent as the commit fails and we throw an exception > > > > > So I think we should not introduce USER_ERROR at all. It is too much > > > of a kluge: fail, but not really... > > > > What we do now is actually worse as we do not get an error report and we > silently change commit to rollback. > > How is this better ? > > I see your point from the view of the JDBC driver. > > It just feels hacky - somewhat similar to what you say > above: don't go through the normal transaction rollback steps, > but issue an error message. > > At least we should fake it well... > OK, let me look into how we deal with COMMIT and CHAIN. I can see some real issues with this as Vik pointed out. Dave
Re: mkid reference
On Tue, Jan 26, 2021 at 05:03:30PM +0100, Magnus Hagander wrote: > On Mon, Jan 25, 2021 at 4:38 PM Bruce Momjian wrote: > > > > On Fri, Jan 22, 2021 at 01:07:36PM -0500, Tom Lane wrote: > > > > There's also src/tools/make_mkid which use this mkid tool. +1 for > > > > removing. > > > > If anything, it seems better replaced by extended documentation on the > > > > existing > > > > wiki article [0] on how to use "git format-patch". > > > > > > I found man pages for mkid online --- it's apparently a ctags-like > > > code indexing tool, not something for patches. So maybe Bruce still > > > uses it, or maybe not. But as long as we've also got make_ctags and > > > > Yes, I do still use it, so I thought having a script to generate its > > index files might be helpful to someone. > > Where do you actually get it? The old docs (now removed) suggested > getting it off ftp.postgresql.org... Not sure why it was on our ftp site, since it is a GNU download: https://www.gnu.org/software/idutils/ -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Thu, Jan 21, 2021 at 3:08 PM Matthias van de Meent wrote: > Re-thinking this, and after some research: > > Is the behaviour of "any process that invalidates TIDs in this table > (that could be in an index on this table) always holds a lock that > conflicts with CIC/RiC on that table" a requirement of tableams, or is > it an implementation-detail? > > If it is a requirement, then this patch is a +1 for me (and that > requirement should be documented in such case), otherwise a -1 while > there is no mechanism in place to remove concurrently-invalidated TIDs > from CIC-ed/RiC-ed indexes. I don't believe that the table AM machinery presumes that every kind of table AM necessarily has to support VACUUM; or at least, I don't think the original version presumed that. However, if you want it to work some other way, there's not really any infrastructure for that, either. Like, if you want to run your own background workers to clean up your table, you could, but you'd have to arrange that yourself. If you want a SWEEP command or an OPTIMIZE TABLE command instead of VACUUM, you'd have to hack the grammar. Likewise, I don't know that there's any guarantee that CLUSTER would work on a table of any old AM either, or that it would do anything useful. But having said that, if you have a table AM that *does* support VACUUM and CLUSTER, I think it would have to treat TIDs pretty much just as we do today. Almost by definition, they have to live with the way the rest of the system works. TIDs have to be 6 bytes, and lock levels can't be changed, because there's nothing in table AM that would let you tinker with that stuff. The table AM can opt out of that machinery altogether in favor of just throwing an error, but it can't make it work differently unless somebody extends the API. It's possible that this might suck a lot for some AMs. For instance, if your AM is an in-memory btree, you might want CLUSTER to work in place, rather than by copying the whole relation file, but I doubt it's possible to make that work using the APIs as they exist. Maybe someday we'll have better ones, but right now we don't. As far as the specific question asked here, I don't really understand how it could ever work any differently. If you want to invalidate some TIDs in such a way that they can be reused by unrelated tuples - in the case of the heap this would be by making the line pointers LP_UNUSED - that has to be coordinated with the indexing machinery. I can imagine a table AM that never reuses TIDs - especially if we eventually have TIDs > 6 bytes - but I can't imagine a table AM that reuses TIDs that might still be present in the index without telling the index. And that seems to be what is being proposed here: if CIC or RiC could be putting TIDs into indexes while at the same time some of those very same TIDs were being made available for reuse, that would be chaos, and right now we have no mechanism other than the lock level to prevent that. We could invent one, maybe, but it doesn't exist today, and zheap never tried to invent anything like that. I agree that, if we do this, the comment should make clear that the CIC/RiC lock has to conflict with VACUUM to avoid indexing tuples that VACUUM is busy marking LP_UNUSED, and that this is precisely because other backends will ignore the CIC/RiC snapshot while determining whether a tuple is live. I don't think this is the case. Hypothetically speaking, suppose we rejected this patch. Then, suppose someone proposed to replace ShareUpdateExclusiveLock with two new lock levels VacuumTuplesLock and IndexTheRelationLock each of which conflicts with itself and all other lock levels with which ShareUpdateExclusiveLock conflicts, but the two don't conflict with each other. AFAIK, that patch would be acceptable today and unacceptable after this change. While I'm not arguing that as a reason to reject this patch, it's not impossible that somebody could: they'd just need to argue that the separated lock levels would have greater value than this patch. I don't think I believe that, but it's not a totally crazy suggestion. My main point though is that this meaningfully changing some assumptions about how the system works, and we should be sure to be clear about that. I looked at the patch but don't really understand it; the code in this area seems to have changed a lot since I last looked at it. So, I'm just commenting mostly on the specific question that was asked, and a little bit on the theory of the patch, but without expressing an opinion on the implementation. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Error on failed COMMIT
On Tue, 2021-01-26 at 12:25 -0500, Dave Cramer wrote: > > After thinking some more about it, I think that COMMIT AND CHAIN would have > > to change behavior: if COMMIT throws an error (because the transaction was > > aborted), no new transaction should be started. Everything else seems > > fishy: > > the statement fails, but still starts a new transaction? > > > > I guess that's also at fault for the unexpected result status that > > Masahiko complained about in the other message. > > > I haven't had a look at the result status in libpq. For JDBC we don't see > that. > We throw an exception when we get this error report. This is very consistent > as the commit fails and we throw an exception > > > So I think we should not introduce USER_ERROR at all. It is too much > > of a kluge: fail, but not really... > > What we do now is actually worse as we do not get an error report and we > silently change commit to rollback. > How is this better ? I see your point from the view of the JDBC driver. It just feels hacky - somewhat similar to what you say above: don't go through the normal transaction rollback steps, but issue an error message. At least we should fake it well... Yours, Laurenz Albe
Re: Error on failed COMMIT
On 1/26/21 6:34 PM, Vik Fearing wrote: > On 1/26/21 6:20 PM, Laurenz Albe wrote: >> After thinking some more about it, I think that COMMIT AND CHAIN would have >> to change behavior: if COMMIT throws an error (because the transaction was >> aborted), no new transaction should be started. Everything else seems fishy: >> the statement fails, but still starts a new transaction? > > The standard is not clear (to me) on what exactly should happen here. > It says that if a is not successful then a statement> is implied, but I don't see it say anything about whether the > AND CHAIN should be propagated too. > > My vote is that COMMIT AND CHAIN should become ROLLBACK AND NO CHAIN. Hmm. On the other hand, that means if the client isn't paying attention, it'll start executing commands outside of a transaction which will autocommit. It might be better for a new transaction to be chained and hopefully also fail because previous bits are missing. I will hastily change my vote to "unsure". -- Vik Fearing
Re: Error on failed COMMIT
On 1/26/21 6:20 PM, Laurenz Albe wrote: > After thinking some more about it, I think that COMMIT AND CHAIN would have > to change behavior: if COMMIT throws an error (because the transaction was > aborted), no new transaction should be started. Everything else seems fishy: > the statement fails, but still starts a new transaction? The standard is not clear (to me) on what exactly should happen here. It says that if a is not successful then a is implied, but I don't see it say anything about whether the AND CHAIN should be propagated too. My vote is that COMMIT AND CHAIN should become ROLLBACK AND NO CHAIN. -- Vik Fearing
Re: Error on failed COMMIT
On Tue, 26 Jan 2021 at 12:20, Laurenz Albe wrote: > On Tue, 2021-01-26 at 11:09 -0500, Dave Cramer wrote: > > On Tue, 26 Jan 2021 at 05:05, Laurenz Albe > wrote: > > > > > I wonder about the introduction of the new USER_ERROR level: > > > > > > #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client > as usual, but > > > * never to the server log. */ > > > -#define ERROR 21 /* user error - abort transaction; > return to > > > +#define USER_ERROR 21 > > > +#define ERROR 22 /* user error - abort transaction; > return to > > > * known state */ > > > /* Save ERROR value in PGERROR so it can be restored when Win32 > includes > > > * modify it. We have to use a constant rather than ERROR because > macros > > > * are expanded only when referenced outside macros. > > > */ > > > #ifdef WIN32 > > > -#define PGERROR21 > > > +#define PGERROR22 > > > #endif > > > -#define FATAL 22 /* fatal error - abort process */ > > > -#define PANIC 23 /* take down the other backends with > me */ > > > +#define FATAL 23 /* fatal error - abort process */ > > > +#define PANIC 24 /* take down the other backends with > me */ > > > > > > I see that without that, COMMIT AND CHAIN does not behave correctly, > > > since the respective regression tests fail. > > > > > > But I don't understand why. I think that this needs some more > comments to > > > make this clear. > > > > First off thanks for reviewing. > > > > The problem is that ereport does not return for any level equal to or > above ERROR. > > This code required it to return so that it could continue processing > > Oh, I see. > > After thinking some more about it, I think that COMMIT AND CHAIN would have > to change behavior: if COMMIT throws an error (because the transaction was > aborted), no new transaction should be started. Everything else seems > fishy: > the statement fails, but still starts a new transaction? > > I guess that's also at fault for the unexpected result status that > Masahiko complained about in the other message. > I haven't had a look at the result status in libpq. For JDBC we don't see that. We throw an exception when we get this error report. This is very consistent as the commit fails and we throw an exception > So I think we should not introduce USER_ERROR at all. It is too much > of a kluge: fail, but not really... > What we do now is actually worse as we do not get an error report and we silently change commit to rollback. How is this better ? Dave >
Re: Error on failed COMMIT
On Tue, 2021-01-26 at 11:09 -0500, Dave Cramer wrote: > On Tue, 26 Jan 2021 at 05:05, Laurenz Albe wrote: > > > I wonder about the introduction of the new USER_ERROR level: > > > > #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client as > > usual, but > > * never to the server log. */ > > -#define ERROR 21 /* user error - abort transaction; return to > > +#define USER_ERROR 21 > > +#define ERROR 22 /* user error - abort transaction; return to > > * known state */ > > /* Save ERROR value in PGERROR so it can be restored when Win32 includes > > * modify it. We have to use a constant rather than ERROR because macros > > * are expanded only when referenced outside macros. > > */ > > #ifdef WIN32 > > -#define PGERROR21 > > +#define PGERROR22 > > #endif > > -#define FATAL 22 /* fatal error - abort process */ > > -#define PANIC 23 /* take down the other backends with me */ > > +#define FATAL 23 /* fatal error - abort process */ > > +#define PANIC 24 /* take down the other backends with me */ > > > > I see that without that, COMMIT AND CHAIN does not behave correctly, > > since the respective regression tests fail. > > > > But I don't understand why. I think that this needs some more comments to > > make this clear. > > First off thanks for reviewing. > > The problem is that ereport does not return for any level equal to or above > ERROR. > This code required it to return so that it could continue processing Oh, I see. After thinking some more about it, I think that COMMIT AND CHAIN would have to change behavior: if COMMIT throws an error (because the transaction was aborted), no new transaction should be started. Everything else seems fishy: the statement fails, but still starts a new transaction? I guess that's also at fault for the unexpected result status that Masahiko complained about in the other message. So I think we should not introduce USER_ERROR at all. It is too much of a kluge: fail, but not really... I guess that is one example for the incompatibilities that Tom worried about upthread. I am beginning to see his point better now. Yours, Laurenz Albe
Re: FETCH FIRST clause PERCENT option
On Mon, Jan 25, 2021 at 2:39 PM Kyotaro Horiguchi wrote: > Sorry for the dealy. I started to look this. > > Hi kyotaro, Thanks for looking into this but did we agree to proceed on this approach? I fear that it will be the west of effort if Andrew comes up with the patch for his approach. Andrew Gierth: Are you working on your approach? regards Surafel
Re: Error on failed COMMIT
On Tue, 26 Jan 2021 at 05:05, Laurenz Albe wrote: > On Mon, 2021-01-25 at 11:29 -0500, Dave Cramer wrote: > > Rebased against head > > > > Here's my summary of the long thread above. > > > > This change is in keeping with the SQL spec. > > > > There is an argument (Tom) that says that this will annoy more people > than it will please. > > I presume this is due to the fact that libpq behaviour will change. > > > > As the author of the JDBC driver, and I believe I speak for other driver > (NPGSQL for one) > > authors as well that have implemented the protocol I would argue that > the current behaviour > > is more annoying. > > > > We currently have to keep state and determine if COMMIT actually failed > or it ROLLED BACK. > > There are a number of async drivers that would also benefit from not > having to keep state > > in the session. > > I think this change makes sense, but I think everybody agrees that it does > as it > makes PostgreSQL more standard compliant. > > About the fear that it will break user's applications: > > I think that the breakage will be minimal. All that will change is that > COMMIT of > an aborted transaction raises an error. > > Applications that catch an error in a transaction and roll back will not > be affected. What will be affected are applications that do *not* check > for > errors in statements in a transaction, but check for errors in the COMMIT. > I think that doesn't happen often. > > I agree that some people will be hurt, but I don't think it will be a > major problem. > > The patch applies and passes regression tests. > > I wonder about the introduction of the new USER_ERROR level: > > #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client as > usual, but > * never to the server log. */ > -#define ERROR 21 /* user error - abort transaction; return > to > +#define USER_ERROR 21 > +#define ERROR 22 /* user error - abort transaction; return > to > * known state */ > /* Save ERROR value in PGERROR so it can be restored when Win32 includes > * modify it. We have to use a constant rather than ERROR because macros > * are expanded only when referenced outside macros. > */ > #ifdef WIN32 > -#define PGERROR21 > +#define PGERROR22 > #endif > -#define FATAL 22 /* fatal error - abort process */ > -#define PANIC 23 /* take down the other backends with me */ > +#define FATAL 23 /* fatal error - abort process */ > +#define PANIC 24 /* take down the other backends with me */ > > I see that without that, COMMIT AND CHAIN does not behave correctly, > since the respective regression tests fail. > > But I don't understand why. I think that this needs some more comments to > make this clear. > > First off thanks for reviewing. The problem is that ereport does not return for any level equal to or above ERROR. This code required it to return so that it could continue processing So after re-acquainting myself with the code I propose: Now we could use "TRANSACTION_ERROR" instead of "USER_ERROR" I'd like to comment more but I do not believe that elog.h is the place. Suggestions ? index 3c0e57621f..df79a2d6db 100644 --- a/src/include/utils/elog.h +++ b/src/include/utils/elog.h @@ -42,17 +42,19 @@ * WARNING is for unexpected messages. */ #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client as usual, but * never to the server log. */ -#define ERROR 21 /* user error - abort transaction; return to +#define USER_ERROR 21 /* similar to ERROR, except we don't want to + * exit the current context. */ +#define ERROR 22 /* user error - abort transaction; return to * known state */ /* Save ERROR value in PGERROR so it can be restored when Win32 includes * modify it. We have to use a constant rather than ERROR because macros * are expanded only when referenced outside macros. */ #ifdef WIN32 -#define PGERROR21 +#define PGERROR22 #endif -#define FATAL 22 /* fatal error - abort process */ -#define PANIC 23 /* take down the other backends with me */ +#define FATAL 23 /* fatal error - abort process */ +#define PANIC 24 /* take down the other backends with me */ > Is this new message level something we need to allow setting for > "client_min_messages" and "log_min_messages"? > Good question. I had not given that any thought. Dave Cramer www.postgres.rocks > >
Re: mkid reference
On Mon, Jan 25, 2021 at 4:38 PM Bruce Momjian wrote: > > On Fri, Jan 22, 2021 at 01:07:36PM -0500, Tom Lane wrote: > > > There's also src/tools/make_mkid which use this mkid tool. +1 for > > > removing. > > > If anything, it seems better replaced by extended documentation on the > > > existing > > > wiki article [0] on how to use "git format-patch". > > > > I found man pages for mkid online --- it's apparently a ctags-like > > code indexing tool, not something for patches. So maybe Bruce still > > uses it, or maybe not. But as long as we've also got make_ctags and > > Yes, I do still use it, so I thought having a script to generate its > index files might be helpful to someone. Where do you actually get it? The old docs (now removed) suggested getting it off ftp.postgresql.org... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: About to add WAL write/fsync statistics to pg_stat_wal view
On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda wrote: > > > (wal_write) > > The number of times WAL buffers were written out to disk via XLogWrite > > > > Thanks. > > I thought it's better to omit "The" and "XLogWrite" because other views' > description > omits "The" and there is no description of "XlogWrite" in the documents. > What do you think? > > The documentation for WAL does get into the public API level of detail and doing so here makes what this measures crystal clear. The potential absence of sufficient detail elsewhere should be corrected instead of making this description more vague. Specifically, probably XLogWrite should be added to the WAL overview as part of this update and probably even have the descriptive section of the documentation note that the number of times that said function is executed is exposed as a counter in the wal statistics table - thus closing the loop. David J.
Re: Error on failed COMMIT
On Tue, 26 Jan 2021 at 06:59, Masahiko Sawada wrote: > On Tue, Jan 26, 2021 at 7:06 PM Laurenz Albe > wrote: > > > > On Mon, 2021-01-25 at 11:29 -0500, Dave Cramer wrote: > > > Rebased against head > > > > > > Here's my summary of the long thread above. > > > > > > This change is in keeping with the SQL spec. > > > > > > There is an argument (Tom) that says that this will annoy more people > than it will please. > > > I presume this is due to the fact that libpq behaviour will change. > > > > > > As the author of the JDBC driver, and I believe I speak for other > driver (NPGSQL for one) > > > authors as well that have implemented the protocol I would argue that > the current behaviour > > > is more annoying. > > > > > > We currently have to keep state and determine if COMMIT actually > failed or it ROLLED BACK. > > > There are a number of async drivers that would also benefit from not > having to keep state > > > in the session. > > > > I think this change makes sense, but I think everybody agrees that it > does as it > > makes PostgreSQL more standard compliant. > > > > About the fear that it will break user's applications: > > > > I think that the breakage will be minimal. All that will change is that > COMMIT of > > an aborted transaction raises an error. > > > > Applications that catch an error in a transaction and roll back will not > > be affected. What will be affected are applications that do *not* check > for > > errors in statements in a transaction, but check for errors in the > COMMIT. > > I think that doesn't happen often. > > > > I agree that some people will be hurt, but I don't think it will be a > major problem. > > > > The patch applies and passes regression tests. > > > > I wonder about the introduction of the new USER_ERROR level: > > > > #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client as > usual, but > > * never to the server log. */ > > -#define ERROR 21 /* user error - abort transaction; > return to > > +#define USER_ERROR 21 > > +#define ERROR 22 /* user error - abort transaction; > return to > > * known state */ > > /* Save ERROR value in PGERROR so it can be restored when Win32 includes > > * modify it. We have to use a constant rather than ERROR because > macros > > * are expanded only when referenced outside macros. > > */ > > #ifdef WIN32 > > -#define PGERROR21 > > +#define PGERROR22 > > #endif > > -#define FATAL 22 /* fatal error - abort process */ > > -#define PANIC 23 /* take down the other backends with me > */ > > +#define FATAL 23 /* fatal error - abort process */ > > +#define PANIC 24 /* take down the other backends with me > */ > > > > I see that without that, COMMIT AND CHAIN does not behave correctly, > > since the respective regression tests fail. > > > > But I don't understand why. I think that this needs some more comments > to > > make this clear. > > While testing the patch I realized that the client gets an > acknowledgment of COMMIT command completed successfully from > PostgreSQL server (i.g., PQgetResult() returns PGRES_COMMAND_OK) even > if the server raises an USER_ERROR level error. I think the command > should be failed. Because otherwise, the drivers need to throw an > exception by re-interpreting the results even in a case where the > command is completed successfully. > > Regards, > Interesting. Thanks for looking at this. I'm curious what we return now when we return rollback instead Dave > > -- > Masahiko Sawada > EDB: https://www.enterprisedb.com/ >
Re: WIP: System Versioned Temporal Table
On Tue, Jan 26, 2021 at 12:51 PM Vik Fearing wrote: > > On 1/26/21 1:16 PM, Simon Riggs wrote: > > On Tue, Jan 26, 2021 at 11:33 AM Vik Fearing > > wrote: > >> > >> On 1/11/21 3:02 PM, Simon Riggs wrote: > >>> * UPDATE foo SET start_timestamp = DEFAULT should fail but currently > >>> doesn't > >> > >> I'm still in the weeds of reviewing this patch, but why should this > >> fail? It should not fail. > > > > It should not be possible for the user to change the start or end > > timestamp of a system_time time range, by definition. > > Correct, but setting it to DEFAULT is not changing it. > > See also SQL:2016 11.5 General Rule 3.a. Thanks for pointing this out. Identity columns don't currently work that way. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: WIP: System Versioned Temporal Table
On 1/26/21 1:16 PM, Simon Riggs wrote: > On Tue, Jan 26, 2021 at 11:33 AM Vik Fearing wrote: >> >> On 1/11/21 3:02 PM, Simon Riggs wrote: >>> * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't >> >> I'm still in the weeds of reviewing this patch, but why should this >> fail? It should not fail. > > It should not be possible for the user to change the start or end > timestamp of a system_time time range, by definition. Correct, but setting it to DEFAULT is not changing it. See also SQL:2016 11.5 General Rule 3.a. -- Vik Fearing
Re: Allow matching whole DN from a client certificate
> On 20 Jan 2021, at 20:07, Jacob Champion wrote: > > On Mon, 2021-01-18 at 11:23 +0100, Daniel Gustafsson wrote: >> + /* use commas instead of slashes */ >> + X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_SEP_COMMA_PLUS); >> I don't have strong opinions on whether we shold use slashes or commas, but I >> think it needs to be documented that commas are required since slashes is the >> more common way to print a DN. > > There's also a XN_FLAG_RFC2253 flag, which claims to print in an RFC- > compatible escape system. (It includes XN_FLAG_SEP_COMMA_PLUS.) I think > NSS uses a similar RFC-based escape scheme, but I haven't checked to > see whether it's fully compatible. > > I think you'll want to be careful to specify the format as much as > possible, both to make sure that other backend TLS implementations can > actually use the same escaping system and to ensure that user regexes > don't suddenly start matching different things at some point in the > future. Reading more on this it seems we would essentially have to go with RFC 4514, as it's the closest to a standard which seems to exist. Having done a lot research on this topic, do you have a gut feeling on direction? The OpenSSL X509_NAME_cmp function use RFC 5280 section 7.1 and RFC 4517 section 4.2.15 (which in turn reference RFC 4514 for the DN string format). libnss has CERT_AsciiToName which is referencing RFCs 1485, 1779 and 2253 in lib/certdb/alg1485.c. Comparing the two would be interesting. > Even when using RFC 2253 (now 4514) escaping, there are things left to the > implementation, such as the order of AVAs inside multivalued RDNs. The RFC > warns not to consider these representations to be canonical forms, so it's > possible that switching (or upgrading) the TLS library in use could force > users to change their regexes at some point in the future. Right, multi-value RDN's are defined as sets so C=US,A=B+C=D is equivalent to C=US,C=D+A=B according to RFC 5280. -- Daniel Gustafsson https://vmware.com/
Re: Extensions not dumped when --schema is used
Le mar. 26 janv. 2021 à 13:41, Guillaume Lelarge a écrit : > Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud a > écrit : > >> On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge >> wrote: >> > >> > "Anytime soon" was a long long time ago, and I eventually completely >> forgot this, sorry. As nobody worked on it yet, I took a shot at it. See >> attached patch. >> >> Great! >> >> I didn't reviewed it thoroughly yet, but after a quick look it sounds >> sensible. I'd prefer to see some tests added, and it looks like a >> test for plpgsql could be added quite easily. >> >> > I tried that all afternoon yesterday, but failed to do so. My had still > hurts, but I'll try again though it may take some time. > > s/My had/My head/ .. > I don't know if I should add this right away in the commit fest app. If >> yes, I guess it should go on the next commit fest (2021-03), right? >> >> Correct, and please add it on the commit fest! >> > > Done, see https://commitfest.postgresql.org/32/2956/. > > -- Guillaume.
Re: Extensions not dumped when --schema is used
Le mar. 26 janv. 2021 à 05:10, Julien Rouhaud a écrit : > On Mon, Jan 25, 2021 at 9:34 PM Guillaume Lelarge > wrote: > > > > "Anytime soon" was a long long time ago, and I eventually completely > forgot this, sorry. As nobody worked on it yet, I took a shot at it. See > attached patch. > > Great! > > I didn't reviewed it thoroughly yet, but after a quick look it sounds > sensible. I'd prefer to see some tests added, and it looks like a > test for plpgsql could be added quite easily. > > I tried that all afternoon yesterday, but failed to do so. My had still hurts, but I'll try again though it may take some time. > I don't know if I should add this right away in the commit fest app. If > yes, I guess it should go on the next commit fest (2021-03), right? > > Correct, and please add it on the commit fest! > Done, see https://commitfest.postgresql.org/32/2956/. -- Guillaume.
Re: WIP: System Versioned Temporal Table
On Tue, Jan 26, 2021 at 11:33 AM Vik Fearing wrote: > > On 1/11/21 3:02 PM, Simon Riggs wrote: > > * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't > > I'm still in the weeds of reviewing this patch, but why should this > fail? It should not fail. It should not be possible for the user to change the start or end timestamp of a system_time time range, by definition. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Hello Joel, Thank you for your kind words and happy that you benefited from this patch. We simply assert that the update/delete method used is supported currently only "NO ACTION" and "RESTRICT", this can be extended in future patches without rework, just extra logic. Please don't hesitate to give your feedback. I would love some help with some performance comparisons if you are up to it, between Many-to-Many, Foreign Key Arrays and Gin Indexed Foreign Key Arrays. /Mark On Tue, Jan 26, 2021 at 1:51 PM Joel Jacobson wrote: > Hi Mark, > > On Mon, Jan 25, 2021, at 09:14, Joel Jacobson wrote: > > I'll continue testing over the next couple of days and report if I find > any more oddities. > > > I've continued testing by trying to make full use of this feature in the > catalog-diff-tool I'm working on. > > Today I became aware of a SQL feature that I must confess I've never used > before, > that turned out to be useful in my tool, as I then wouldn't need to follow > FKs > to do updates manually, but let the database do them automatically. > > I'm talking about "ON CASCADE UPDATE", which I see is not supported in the > patch. > > In my tool, I could simplify the code for normal FKs by using ON CASCADE > UPDATE, > but will continue to need my hand-written traverse-FKs-recursively code > for Foreign Key Arrays. > > I lived a long SQL life without ever needing ON CASCADE UPDATE, > so I would definitively vote for Foreign Key Arrays to be added even > without support for this. > > Would you say the patch is written in a way which would allow adding > support for ON CASCADE UPDATE > later on mostly by adding code, or would it require a major rewrite? > > I hesitated if I should share this with you, since I'm really grateful for > this feature even without ON CASCADE UPDATE, > but since this was discovered when testing real-life scenario and not some > hypothetical example, > I felt it should be noted that I stumbled upon this during testing. > > Again, thank you so much for working on this. > > /Joel >
Re: Error on failed COMMIT
On Tue, Jan 26, 2021 at 7:06 PM Laurenz Albe wrote: > > On Mon, 2021-01-25 at 11:29 -0500, Dave Cramer wrote: > > Rebased against head > > > > Here's my summary of the long thread above. > > > > This change is in keeping with the SQL spec. > > > > There is an argument (Tom) that says that this will annoy more people than > > it will please. > > I presume this is due to the fact that libpq behaviour will change. > > > > As the author of the JDBC driver, and I believe I speak for other driver > > (NPGSQL for one) > > authors as well that have implemented the protocol I would argue that the > > current behaviour > > is more annoying. > > > > We currently have to keep state and determine if COMMIT actually failed or > > it ROLLED BACK. > > There are a number of async drivers that would also benefit from not > > having to keep state > > in the session. > > I think this change makes sense, but I think everybody agrees that it does as > it > makes PostgreSQL more standard compliant. > > About the fear that it will break user's applications: > > I think that the breakage will be minimal. All that will change is that > COMMIT of > an aborted transaction raises an error. > > Applications that catch an error in a transaction and roll back will not > be affected. What will be affected are applications that do *not* check for > errors in statements in a transaction, but check for errors in the COMMIT. > I think that doesn't happen often. > > I agree that some people will be hurt, but I don't think it will be a major > problem. > > The patch applies and passes regression tests. > > I wonder about the introduction of the new USER_ERROR level: > > #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client as > usual, but > * never to the server log. */ > -#define ERROR 21 /* user error - abort transaction; return to > +#define USER_ERROR 21 > +#define ERROR 22 /* user error - abort transaction; return to > * known state */ > /* Save ERROR value in PGERROR so it can be restored when Win32 includes > * modify it. We have to use a constant rather than ERROR because macros > * are expanded only when referenced outside macros. > */ > #ifdef WIN32 > -#define PGERROR21 > +#define PGERROR22 > #endif > -#define FATAL 22 /* fatal error - abort process */ > -#define PANIC 23 /* take down the other backends with me */ > +#define FATAL 23 /* fatal error - abort process */ > +#define PANIC 24 /* take down the other backends with me */ > > I see that without that, COMMIT AND CHAIN does not behave correctly, > since the respective regression tests fail. > > But I don't understand why. I think that this needs some more comments to > make this clear. While testing the patch I realized that the client gets an acknowledgment of COMMIT command completed successfully from PostgreSQL server (i.g., PQgetResult() returns PGRES_COMMAND_OK) even if the server raises an USER_ERROR level error. I think the command should be failed. Because otherwise, the drivers need to throw an exception by re-interpreting the results even in a case where the command is completed successfully. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Hi Mark, On Mon, Jan 25, 2021, at 09:14, Joel Jacobson wrote: > I'll continue testing over the next couple of days and report if I find any > more oddities. I've continued testing by trying to make full use of this feature in the catalog-diff-tool I'm working on. Today I became aware of a SQL feature that I must confess I've never used before, that turned out to be useful in my tool, as I then wouldn't need to follow FKs to do updates manually, but let the database do them automatically. I'm talking about "ON CASCADE UPDATE", which I see is not supported in the patch. In my tool, I could simplify the code for normal FKs by using ON CASCADE UPDATE, but will continue to need my hand-written traverse-FKs-recursively code for Foreign Key Arrays. I lived a long SQL life without ever needing ON CASCADE UPDATE, so I would definitively vote for Foreign Key Arrays to be added even without support for this. Would you say the patch is written in a way which would allow adding support for ON CASCADE UPDATE later on mostly by adding code, or would it require a major rewrite? I hesitated if I should share this with you, since I'm really grateful for this feature even without ON CASCADE UPDATE, but since this was discovered when testing real-life scenario and not some hypothetical example, I felt it should be noted that I stumbled upon this during testing. Again, thank you so much for working on this. /Joel
Re: WIP: System Versioned Temporal Table
On 1/11/21 3:02 PM, Simon Riggs wrote: > * UPDATE foo SET start_timestamp = DEFAULT should fail but currently doesn't I'm still in the weeds of reviewing this patch, but why should this fail? It should not fail. -- Vik Fearing
Fix ALTER SUBSCRIPTION ... SET PUBLICATION documentation
Hi, When I read the documentation of ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (...), it says "set_publication_option" only support "refresh" in documentation [1]. However, we can also supply the "copy_data" option, and the code is: case ALTER_SUBSCRIPTION_PUBLICATION: { boolcopy_data; boolrefresh; parse_subscription_options(stmt->options, NULL,/* no "connect" */ NULL, NULL, /* no "enabled" */ NULL,/* no "create_slot" */ NULL, NULL, /* no "slot_name" */ _data, NULL,/* no "synchronous_commit" */ , NULL, NULL, /* no "binary" */ NULL, NULL); /* no "streaming" */ values[Anum_pg_subscription_subpublications - 1] = publicationListToArray(stmt->publication); replaces[Anum_pg_subscription_subpublications - 1] = true; update_tuple = true; /* Refresh if user asked us to. */ if (refresh) { if (!sub->enabled) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"), errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false)."))); /* Make sure refresh sees the new list of publications. */ sub->publications = stmt->publication; AlterSubscription_refresh(sub, copy_data); } break; } Should we fix the documentation or the code? I'd be inclined fix the documentation. [1] - https://www.postgresql.org/docs/devel/sql-altersubscription.html -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 97c427e0f4..418e9a8ce1 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -101,6 +101,17 @@ ALTER SUBSCRIPTION name RENAME TO < + +copy_data (boolean) + + + Specifies whether the existing data in the publications that are + being subscribed to should be copied once the replication starts. + The default is true. (Previously subscribed + tables are not copied.) + + + Additionally, refresh options as described
Re: shared tempfile was not removed on statement_timeout
At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas wrote in > On 26/01/2021 06:46, Kyotaro Horiguchi wrote: > > Looking the comment of SharedFileSetOnDetach: > > | * everything in them. We can't raise an error on failures, because this > > | * runs > > | * in error cleanup paths. > > I feel that a function that shouldn't error-out also shouldn't be > > cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in > > walkdir() when elevel is smaller than ERROR. > > = > > diff --git a/src/backend/storage/file/fd.c > > b/src/backend/storage/file/fd.c > > index b58502837a..593c23553e 100644 > > --- a/src/backend/storage/file/fd.c > > +++ b/src/backend/storage/file/fd.c > > @@ -3374,7 +3374,9 @@ walkdir(const char *path, > > { > > charsubpath[MAXPGPATH * 2]; > > - CHECK_FOR_INTERRUPTS(); > > + /* omit interrupts while we shouldn't error-out */ > > + if (elevel >= ERROR) > > + CHECK_FOR_INTERRUPTS(); > > if (strcmp(de->d_name, ".") == 0 || > > strcmp(de->d_name, "..") == 0) > > = > > Don't we potentially have the same problem with all on_dsm_detach > callbacks? Looking at the other on_dsm_detach callbacks, I don't see > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to > assume that. > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least > around the removal of the callback from the list and calling the > callback. Maybe even over the whole function. Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same location. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax
On Tue, 26 Jan 2021 at 13:46, japin wrote: > Hi, Bharath > > Thanks for your reviewing. > > On Tue, 26 Jan 2021 at 12:55, Bharath Rupireddy > wrote: >> On Tue, Jan 26, 2021 at 9:25 AM japin wrote: >>> > I think it's more convenient. Any thoughts? >>> >>> Sorry, I forgot to attach the patch. >> >> As I mentioned earlier in [1], +1 from my end to have the new syntax >> for adding/dropping publications from subscriptions i.e. ALTER >> SUBSCRIPTION ... ADD/DROP PUBLICATION. But I'm really not sure why >> that syntax was not added earlier. Are we missing something here? >> > > Yeah, we should figure out why we do not support this syntax earlier. It > seems > ALTER SUBSCRIPTION is introduced in 665d1fad99e, however the commit do not > contains any discussion URL. > >> I would like to hear opinions from other hackers. >> >> [1] - >> https://www.postgresql.org/message-id/CALj2ACVGDNZDQk3wfv%3D3zYTg%3DqKUaEa5s1f%2B9_PYxN0QyAUdCw%40mail.gmail.com >> >> Some quick comments on the patch, although I have not taken a deeper look at >> it: >> >> 1. IMO, it will be good if the patch is divided into say coding, test >> cases and documentation > > Agreed. I will refactor it in next patch. > Split v1 path into coding, test cases, documentation and tab-complete. >> 2. Looks like AlterSubscription() is already having ~200 LOC, why >> can't we have separate functions for each ALTER_SUBSCRIPTION_ case >> or at least for the new code that's getting added for this patch? > > I'm not sure it is necessary to provide a separate functions for each > ALTER_SUBSCRIPTION_XXX, so I just followed current style. > >> 3. The new code added for ALTER_SUBSCRIPTION_ADD_PUBLICATION and >> ALTER_SUBSCRIPTION_DROP_PUBLICATION looks almost same maybe with >> little difference, so why can't we have single function >> (alter_subscription_add_or_drop_publication or >> hanlde_add_or_drop_publication or some better name?) and pass in a >> flag to differentiate the code that differs for both commands. > > Agreed. At present, I create a new function merge_subpublications() to merge the origin publications and add/drop publications. Thoughts? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. >From 5af6cc67938b31b39fa1998a10a9c7f1bdd8fb0e Mon Sep 17 00:00:00 2001 From: Japin Li Date: Tue, 26 Jan 2021 15:43:52 +0800 Subject: [PATCH v2 1/4] Add ALTER SUBSCRIPTION...ADD/DROP PUBLICATION... syntax --- src/backend/commands/subscriptioncmds.c | 119 src/backend/parser/gram.y | 20 src/include/nodes/parsenodes.h | 2 + 3 files changed, 141 insertions(+) diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 082f7855b8..07167c9a8b 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -46,6 +46,8 @@ #include "utils/syscache.h" static List *fetch_table_list(WalReceiverConn *wrconn, List *publications); +static List *merge_subpublications(HeapTuple tuple, TupleDesc tupledesc, + List *publications, bool addpub); /* * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands. @@ -857,6 +859,51 @@ AlterSubscription(AlterSubscriptionStmt *stmt) break; } + case ALTER_SUBSCRIPTION_ADD_PUBLICATION: + case ALTER_SUBSCRIPTION_DROP_PUBLICATION: + { +bool copy_data; +bool refresh; +List *publications = NIL; + +publications = merge_subpublications(tup, RelationGetDescr(rel), + stmt->publication, + stmt->kind == ALTER_SUBSCRIPTION_ADD_PUBLICATION); + +parse_subscription_options(stmt->options, + NULL, /* no "connect" */ + NULL, NULL, /* no "enabled" */ + NULL, /* no "create_slot" */ + NULL, NULL, /* no "slot_name" */ + _data, + NULL, /* no "synchronous_commit" */ + , + NULL, NULL, /* no "binary" */ + NULL, NULL); /* no "streaming" */ +values[Anum_pg_subscription_subpublications - 1] = + publicationListToArray(publications); +replaces[Anum_pg_subscription_subpublications - 1] = true; + +update_tuple = true; + +/* Refresh if user asked us to. */ +if (refresh) +{ + if (!sub->enabled) + ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("ALTER SUBSCRIPTION with refresh is not allowed for disabled subscriptions"), + errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION ... WITH (refresh = false)."))); + + /* Make sure refresh sees the new list of publications. */ + sub->publications = publications; + + AlterSubscription_refresh(sub, copy_data); +} + +break; + } + case ALTER_SUBSCRIPTION_REFRESH: { bool copy_data; @@ -1278,3 +1325,75 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications) return tablelist; } + +/* + * merge current subpublications and user specified by add/drop
RE: Parallel INSERT (INTO ... SELECT ...)
> I think we can allow parallel insert in this case, because the column value > is determined according to the DEFAULT definition of the target table > specified in the INSERT statement. This is described here: > > https://www.postgresql.org/docs/devel/sql-createtable.html > > "Defaults may be specified separately for each partition. But note that > a partition's default value is not applied when inserting a tuple through > a partitioned table." > > So the parallel-unsafe function should not be called. Thanks for the explanation. I think you are right, I did miss it. Best regards, houzj
Re: automatic analyze: readahead - add "IO read time" log message
On 13/01/2021 23:17, Stephen Frost wrote: Would be great to get a review / comments from others as to if there's any concerns. I'll admit that it seems reasonably straight-forward to me, but hey, I wrote most of it, so that's not really a fair assessment... ;) Look good overall. A few minor comments: The patch consists of two part: add stats to the log for auto-analyze, and implement prefetching. They seem like independent features, consider splitting into two patches. It's a bit weird that you get more stats in the log for autovacuum/autoanalyze than you get with VACUUM/ANALYZE VERBOSE. Not really this patch's fault though. This conflicts with the patch at https://commitfest.postgresql.org/31/2907/, to refactor the table AM analyze API. That's OK, it's straightforward to resolve regardless of which patch is committed first. /* Outer loop over blocks to sample */ while (BlockSampler_HasMore()) { #ifdef USE_PREFETCH BlockNumber prefetch_targblock = InvalidBlockNumber; #endif BlockNumber targblock = BlockSampler_Next(); #ifdef USE_PREFETCH /* * Make sure that every time the main BlockSampler is moved forward * that our prefetch BlockSampler also gets moved forward, so that we * always stay out ahead. */ if (BlockSampler_HasMore(_bs)) prefetch_targblock = BlockSampler_Next(_bs); #endif vacuum_delay_point(); if (!table_scan_analyze_next_block(scan, targblock, vac_strategy)) continue; #ifdef USE_PREFETCH /* * When pre-fetching, after we get a block, tell the kernel about the * next one we will want, if there's any left. */ if (effective_io_concurrency && prefetch_targblock != InvalidBlockNumber) PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_targblock); #endif while (table_scan_analyze_next_tuple(scan, OldestXmin, , , slot)) { ... } pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE, ++blksdone); } If effective_io_concurrency == 0, this calls BlockSampler_Next(_bs) anyway, which is a waste of cycles. If table_scan_analyze_next_block() returns false, we skip the PrefetchBuffer() call. That seem wrong. Is there any potential harm from calling PrefetchBuffer() on a page that table_scan_analyze_next_block() later deems as unsuitable for smapling and returns false? That's theoretical at the moment, because heapam_scan_analyze_next_block() always returns true. (The tableam ANALYZE API refactor patch will make this moot, as it moves this logic into the tableam's implementation, so the implementation can do whatever make sense for the particular AM.) - Heikki
Re: Error on failed COMMIT
On Mon, 2021-01-25 at 11:29 -0500, Dave Cramer wrote: > Rebased against head > > Here's my summary of the long thread above. > > This change is in keeping with the SQL spec. > > There is an argument (Tom) that says that this will annoy more people than it > will please. > I presume this is due to the fact that libpq behaviour will change. > > As the author of the JDBC driver, and I believe I speak for other driver > (NPGSQL for one) > authors as well that have implemented the protocol I would argue that the > current behaviour > is more annoying. > > We currently have to keep state and determine if COMMIT actually failed or it > ROLLED BACK. > There are a number of async drivers that would also benefit from not having > to keep state > in the session. I think this change makes sense, but I think everybody agrees that it does as it makes PostgreSQL more standard compliant. About the fear that it will break user's applications: I think that the breakage will be minimal. All that will change is that COMMIT of an aborted transaction raises an error. Applications that catch an error in a transaction and roll back will not be affected. What will be affected are applications that do *not* check for errors in statements in a transaction, but check for errors in the COMMIT. I think that doesn't happen often. I agree that some people will be hurt, but I don't think it will be a major problem. The patch applies and passes regression tests. I wonder about the introduction of the new USER_ERROR level: #define WARNING_CLIENT_ONLY20 /* Warnings to be sent to client as usual, but * never to the server log. */ -#define ERROR 21 /* user error - abort transaction; return to +#define USER_ERROR 21 +#define ERROR 22 /* user error - abort transaction; return to * known state */ /* Save ERROR value in PGERROR so it can be restored when Win32 includes * modify it. We have to use a constant rather than ERROR because macros * are expanded only when referenced outside macros. */ #ifdef WIN32 -#define PGERROR21 +#define PGERROR22 #endif -#define FATAL 22 /* fatal error - abort process */ -#define PANIC 23 /* take down the other backends with me */ +#define FATAL 23 /* fatal error - abort process */ +#define PANIC 24 /* take down the other backends with me */ I see that without that, COMMIT AND CHAIN does not behave correctly, since the respective regression tests fail. But I don't understand why. I think that this needs some more comments to make this clear. Is this new message level something we need to allow setting for "client_min_messages" and "log_min_messages"? Yours, Laurenz Albe
Re: [PoC] Non-volatile WAL buffer
Dear everyone, Tomas, First of all, the "v4" patchset for non-volatile WAL buffer attached to the previous mail is actually v5... Please read "v4" as "v5." Then, to Tomas: Thank you for your crash report you gave on Nov 27, 2020, regarding msync patchset. I applied the latest msync patchset v3 attached to the previous to master 411ae64 (on Jan18, 2021) then tested it, and I got no error when pgbench -i -s 500. Please try it if necessary. Best regards, Takashi 2021年1月26日(火) 17:52 Takashi Menjo : > Dear everyone, > > Sorry but I forgot to attach my patchsets... Please see the files attached > to this mail. Please also note that they contain some fixes. > > Best regards, > Takashi > > > 2021年1月26日(火) 17:46 Takashi Menjo : > >> Dear everyone, >> >> I'm sorry for the late reply. I rebase my two patchsets onto the latest >> master 411ae64.The one patchset prefixed with v4 is for non-volatile WAL >> buffer; the other prefixed with v3 is for msync. >> >> I will reply to your thankful feedbacks one by one within days. Please >> wait for a moment. >> >> Best regards, >> Takashi >> >> >> 01/25/2021(Mon) 11:56 Masahiko Sawada : >> >>> On Fri, Jan 22, 2021 at 11:32 AM Tomas Vondra >>> wrote: >>> > >>> > >>> > >>> > On 1/21/21 3:17 AM, Masahiko Sawada wrote: >>> > > On Thu, Jan 7, 2021 at 2:16 AM Tomas Vondra >>> > > wrote: >>> > >> >>> > >> Hi, >>> > >> >>> > >> I think I've managed to get the 0002 patch [1] rebased to master and >>> > >> working (with help from Masahiko Sawada). It's not clear to me how >>> it >>> > >> could have worked as submitted - my theory is that an incomplete >>> patch >>> > >> was submitted by mistake, or something like that. >>> > >> >>> > >> Unfortunately, the benchmark results were kinda disappointing. For a >>> > >> pgbench on scale 500 (fits into shared buffers), an average of three >>> > >> 5-minute runs looks like this: >>> > >> >>> > >> branch 116326496 >>> > >> >>> > >> master 7291 87704165310150437224186 >>> > >> ntt 7912106095213206212410237819 >>> > >> simple-no-buffers 7654 96544115416 95828103065 >>> > >> >>> > >> NTT refers to the patch from September 10, pre-allocating a large >>> WAL >>> > >> file on PMEM, and simple-no-buffers is the simpler patch simply >>> removing >>> > >> the WAL buffers and writing directly to a mmap-ed WAL segment on >>> PMEM. >>> > >> >>> > >> Note: The patch is just replacing the old implementation with mmap. >>> > >> That's good enough for experiments like this, but we probably want >>> to >>> > >> keep the old one for setups without PMEM. But it's good enough for >>> > >> testing, benchmarking etc. >>> > >> >>> > >> Unfortunately, the results for this simple approach are pretty bad. >>> Not >>> > >> only compared to the "ntt" patch, but even to master. I'm not >>> entirely >>> > >> sure what's the root cause, but I have a couple hypotheses: >>> > >> >>> > >> 1) bug in the patch - That's clearly a possibility, although I've >>> tried >>> > >> tried to eliminate this possibility. >>> > >> >>> > >> 2) PMEM is slower than DRAM - From what I know, PMEM is much faster >>> than >>> > >> NVMe storage, but still much slower than DRAM (both in terms of >>> latency >>> > >> and bandwidth, see [2] for some data). It's not terrible, but the >>> > >> latency is maybe 2-3x higher - not a huge difference, but may >>> matter for >>> > >> WAL buffers? >>> > >> >>> > >> 3) PMEM does not handle parallel writes well - If you look at [2], >>> > >> Figure 4(b), you'll see that the throughput actually *drops" as the >>> > >> number of threads increase. That's pretty strange / annoying, >>> because >>> > >> that's how we write into WAL buffers - each thread writes it's own >>> data, >>> > >> so parallelism is not something we can get rid of. >>> > >> >>> > >> I've added some simple profiling, to measure number of calls / time >>> for >>> > >> each operation (use -DXLOG_DEBUG_STATS to enable). It accumulates >>> data >>> > >> for each backend, and logs the counts every 1M ops. >>> > >> >>> > >> Typical stats from a concurrent run looks like this: >>> > >> >>> > >> xlog stats cnt 4300 >>> > >>map cnt 100 time 5448333 unmap cnt 100 time 3730963 >>> > >>memcpy cnt 985964 time 1550442272 len 15150499 >>> > >>memset cnt 0 time 0 len 0 >>> > >>persist cnt 13836 time 10369617 len 16292182 >>> > >> >>> > >> The times are in nanoseconds, so this says the backend did 100 >>> mmap and >>> > >> unmap calls, taking ~10ms in total. There were ~14k pmem_persist >>> calls, >>> > >> taking 10ms in total. And the most time (~1.5s) was used by >>> pmem_memcpy >>> > >> copying about 15MB of data. That's quite a lot :-( >>> > > >>> > > It might also be interesting if we can see how much time spent on >>> each >>> > > logging function,
Re: Protect syscache from bloating with negative cache entries
Hi, On 19/11/2020 07:25, Kyotaro Horiguchi wrote: Performance measurement on the attached showed better result about searching but maybe worse for cache entry creation. Each time number is the mean of 10 runs. # Cacache (negative) entry creation : time(ms) (% to master) master : 3965.61(100.0) patched-off: 4040.93(101.9) patched-on : 4032.22(101.7) # Searching negative cache entries master : 8173.46(100.0) patched-off: 7983.43( 97.7) patched-on : 8049.88( 98.5) # Creation, searching and expiration master : 6393.23(100.0) patched-off: 6527.94(102.1) patched-on : 15880.01(248.4) That is, catcache searching gets faster by 2-3% but creation gets slower by about 2%. If I moved the condition of 2 further up to CatalogCacheCreateEntry(), that degradation reduced to 0.6%. # Cacache (negative) entry creation master : 3967.45 (100.0) patched-off : 3990.43 (100.6) patched-on : 4108.96 (103.6) # Searching negative cache entries master : 8106.53 (100.0) patched-off : 8036.61 ( 99.1) patched-on : 8058.18 ( 99.4) # Creation, searching and expiration master : 6395.00 (100.0) patched-off : 6416.57 (100.3) patched-on : 15830.91 (247.6) Can you share the exact script or steps to reproduce these numbers? I presume these are from the catcachebench extension, but I can't figure out which scenario above corresponds to which catcachebench test. Also, catcachebench seems to depend on a bunch of tables being created in schema called "test"; what tables did you use for the above numbers? - Heikki
Re: Add SQL function for SHA1
> On 26 Jan 2021, at 04:28, Noah Misch wrote: > > On Mon, Jan 25, 2021 at 10:12:28PM +0900, Michael Paquier wrote: >> SHA-1 is now an option available for cryptohashes, and like the >> existing set of functions of SHA-2, I don't really see a reason why we >> should not have a SQL function for SHA1. > > NIST deprecated SHA1 over ten years ago. It's too late to be adding this. Agreed, and pgcrypto already allows for using sha1. It seems like any legitimate need for sha1 could be better served by an extension rather than supplying it in-core. -- Daniel Gustafsson https://vmware.com/
Re: FETCH FIRST clause PERCENT option
On Mon, Jan 25, 2021 at 6:39 AM Kyotaro Horiguchi wrote: > > Sorry for the dealy. I started to look this. > > At Fri, 25 Sep 2020 12:25:24 +0300, Surafel Temesgen > wrote in > > Hi Michael > > On Thu, Sep 24, 2020 at 6:58 AM Michael Paquier wrote: > > > > > On Mon, Aug 10, 2020 at 01:23:44PM +0300, Surafel Temesgen wrote: > > > > I also Implement PERCENT WITH TIES option. patch is attached > > > > i don't start a new tread because the patches share common code > > > > > > This fails to apply per the CF bot. Could you send a rebase? > > This still applies on the master HEAD. > > percent-incremental-v11.patch > > The existing nodeLimit passes the slot of the subnode to the > caller. but this patch changes that behavior. You added a new function > to tuplestore.c not to store a minimal tuple into the slot that passed > from subnode, but we should refrain from scribbling on the slot passed > from the subnode. Instead, the PERCENT path of the limit node should > use its own ResultTupleSlot for the purpose. See nodeSort for a > concrete example. > > > + LIMIT_OPTION_PER_WITH_TIES, /* FETCH FIRST... PERCENT WITH TIES */ > > That name is a bit hard to read. We should spell it with complete > words. > > case LIMIT_INWINDOW: > ... > + if (IsPercentOption(node->limitOption) && > node->backwardPosition > ... > + if (IsPercentOption(node->limitOption) && > node->reachEnd) > ... > + if (IsPercentOption(node->limitOption)) > > I think we can use separate lstate state for each condition above > since IsPercentOption() gives a constant result through the execution > time. For example, LIMIT_PERCENT_TUPLESLOT_NOT_FILLED and > LIMIT_PERCENT_TUPLESLOT_FILLED and some derived states similar to the > non-percent path. I *feel* that makes code simpler. > > What do you think about this? > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > > Hi, I was testing this and found that it doesn't work well with subselects, this query make the server crash: """ select * from pg_type where exists (select 1 from pg_authid fetch first 10 percent rows only); """ postgres was compiled with these options: """ CFLAGS="-ggdb -O0 -g3 -fno-omit-frame-pointer" ./configure --prefix=/opt/var/pgdg/14dev/percent --enable-debug --enable-cassert --with-pgport=54314 --enable-depend """ attached is the stack trace -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {0, 140724437079104, 2, 6, 6458975, 94647092334544, 4611686018427388799, 139701007526566, 0, 281470681751456, 0, 0, 0, 0, 0, 0}} pid = tid = ret = #1 0x7f0eacb55535 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 0, 0, 0, 0, 139701005283317, 2, 3760846768312216920, 7018409443653412194, 94647092334544, 7003717752975484000, 0, 3976935554753230848, 140724437079344, 0, 140724437080208}}, sa_flags = -1101967408, sa_restorer = 0x0} sigs = {__val = {32, 0 }} #2 0x5614beb8d44e in ExceptionalCondition (conditionName=0x5614bed22e7b "limit->consttype == INT8OID", errorType=0x5614bed22b76 "FailedAssertion", fileName=0x5614bed22c44 "subselect.c", lineNumber=1594) at assert.c:69 No locals. #3 0x5614be8df87d in simplify_EXISTS_query (root=0x5614bfecf4b8, query=0x5614bfe038e0) at subselect.c:1594 node = 0x5614bfed0190 limit = 0x5614bfed0190 #4 0x5614be8df49b in convert_EXISTS_sublink_to_join (root=0x5614bfecf4b8, sublink=0x5614bfe036e0, under_not=false, available_rels=0x5614bfecf1d8) at subselect.c:1420 result = 0x5614bfecf940 parse = 0x5614bfe039f8 subselect = 0x5614bfe038e0 whereClause = 0x5614bfecf8b8 rtoffset = 22036 varno = 22036 clause_varnos = 0x18 upper_varnos = 0x20 #5 0x5614be8e3d8d in pull_up_sublinks_qual_recurse (root=0x5614bfecf4b8, node=0x5614bfe036e0, jtlink1=0x7ffcf6155a48, available_rels1=0x5614bfecf1d8, jtlink2=0x0, available_rels2=0x0) at prepjointree.c:458 sublink = 0x5614bfe036e0 j = 0x0 child_rels = 0x5614bfecf940 #6 0x5614be8e38cc in pull_up_sublinks_jointree_recurse (root=0x5614bfecf4b8, jtnode=0x5614bfecf3d0, relids=0x7ffcf6155ab0) at prepjointree.c:275 frelids = 0x5614bfecf1d8 newf = 0x5614bfecf928 jtlink = 0x5614bfecf928 f = 0x5614bfecf3d0 newfromlist = 0x5614bfecf8d0 l = 0x0 __func__ = "pull_up_sublinks_jointree_recurse" #7 0x5614be8e36ce in pull_up_sublinks (root=0x5614bfecf4b8) at prepjointree.c:214 jtnode = 0x5614bfecf6e8 relids = 0x0 #8 0x5614be8c9e6a in subquery_planner (glob=0x5614bfe2acc0,
Re: shared tempfile was not removed on statement_timeout
On 26/01/2021 06:46, Kyotaro Horiguchi wrote: Looking the comment of SharedFileSetOnDetach: | * everything in them. We can't raise an error on failures, because this runs | * in error cleanup paths. I feel that a function that shouldn't error-out also shouldn't be cancellable. If that's the case, we omit the CHECK_FOR_INTERRUPTS() in walkdir() when elevel is smaller than ERROR. = diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index b58502837a..593c23553e 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -3374,7 +3374,9 @@ walkdir(const char *path, { charsubpath[MAXPGPATH * 2]; - CHECK_FOR_INTERRUPTS(); + /* omit interrupts while we shouldn't error-out */ + if (elevel >= ERROR) + CHECK_FOR_INTERRUPTS(); if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) = Don't we potentially have the same problem with all on_dsm_detach callbacks? Looking at the other on_dsm_detach callbacks, I don't see any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to assume that. I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least around the removal of the callback from the list and calling the callback. Maybe even over the whole function. - Heikki
Re: [PoC] Non-volatile WAL buffer
Dear everyone, I'm sorry for the late reply. I rebase my two patchsets onto the latest master 411ae64.The one patchset prefixed with v4 is for non-volatile WAL buffer; the other prefixed with v3 is for msync. I will reply to your thankful feedbacks one by one within days. Please wait for a moment. Best regards, Takashi 01/25/2021(Mon) 11:56 Masahiko Sawada : > On Fri, Jan 22, 2021 at 11:32 AM Tomas Vondra > wrote: > > > > > > > > On 1/21/21 3:17 AM, Masahiko Sawada wrote: > > > On Thu, Jan 7, 2021 at 2:16 AM Tomas Vondra > > > wrote: > > >> > > >> Hi, > > >> > > >> I think I've managed to get the 0002 patch [1] rebased to master and > > >> working (with help from Masahiko Sawada). It's not clear to me how it > > >> could have worked as submitted - my theory is that an incomplete patch > > >> was submitted by mistake, or something like that. > > >> > > >> Unfortunately, the benchmark results were kinda disappointing. For a > > >> pgbench on scale 500 (fits into shared buffers), an average of three > > >> 5-minute runs looks like this: > > >> > > >> branch 116326496 > > >> > > >> master 7291 87704165310150437224186 > > >> ntt 7912106095213206212410237819 > > >> simple-no-buffers 7654 96544115416 95828103065 > > >> > > >> NTT refers to the patch from September 10, pre-allocating a large WAL > > >> file on PMEM, and simple-no-buffers is the simpler patch simply > removing > > >> the WAL buffers and writing directly to a mmap-ed WAL segment on PMEM. > > >> > > >> Note: The patch is just replacing the old implementation with mmap. > > >> That's good enough for experiments like this, but we probably want to > > >> keep the old one for setups without PMEM. But it's good enough for > > >> testing, benchmarking etc. > > >> > > >> Unfortunately, the results for this simple approach are pretty bad. > Not > > >> only compared to the "ntt" patch, but even to master. I'm not entirely > > >> sure what's the root cause, but I have a couple hypotheses: > > >> > > >> 1) bug in the patch - That's clearly a possibility, although I've > tried > > >> tried to eliminate this possibility. > > >> > > >> 2) PMEM is slower than DRAM - From what I know, PMEM is much faster > than > > >> NVMe storage, but still much slower than DRAM (both in terms of > latency > > >> and bandwidth, see [2] for some data). It's not terrible, but the > > >> latency is maybe 2-3x higher - not a huge difference, but may matter > for > > >> WAL buffers? > > >> > > >> 3) PMEM does not handle parallel writes well - If you look at [2], > > >> Figure 4(b), you'll see that the throughput actually *drops" as the > > >> number of threads increase. That's pretty strange / annoying, because > > >> that's how we write into WAL buffers - each thread writes it's own > data, > > >> so parallelism is not something we can get rid of. > > >> > > >> I've added some simple profiling, to measure number of calls / time > for > > >> each operation (use -DXLOG_DEBUG_STATS to enable). It accumulates data > > >> for each backend, and logs the counts every 1M ops. > > >> > > >> Typical stats from a concurrent run looks like this: > > >> > > >> xlog stats cnt 4300 > > >>map cnt 100 time 5448333 unmap cnt 100 time 3730963 > > >>memcpy cnt 985964 time 1550442272 len 15150499 > > >>memset cnt 0 time 0 len 0 > > >>persist cnt 13836 time 10369617 len 16292182 > > >> > > >> The times are in nanoseconds, so this says the backend did 100 mmap > and > > >> unmap calls, taking ~10ms in total. There were ~14k pmem_persist > calls, > > >> taking 10ms in total. And the most time (~1.5s) was used by > pmem_memcpy > > >> copying about 15MB of data. That's quite a lot :-( > > > > > > It might also be interesting if we can see how much time spent on each > > > logging function, such as XLogInsert(), XLogWrite(), and XLogFlush(). > > > > > > > Yeah, we could extend it to that, that's fairly mechanical thing. Bbut > > maybe that could be visible in a regular perf profile. Also, I suppose > > most of the time will be used by the pmem calls, shown in the stats. > > > > >> > > >> My conclusion from this is that eliminating WAL buffers and writing > WAL > > >> directly to PMEM (by memcpy to mmap-ed WAL segments) is probably not > the > > >> right approach. > > >> > > >> I suppose we should keep WAL buffers, and then just write the data to > > >> mmap-ed WAL segments on PMEM. Which I think is what the NTT patch > does, > > >> except that it allocates one huge file on PMEM and writes to that > > >> (instead of the traditional WAL segments). > > >> > > >> So I decided to try how it'd work with writing to regular WAL > segments, > > >> mmap-ed ad hoc. The pmem-with-wal-buffers-master.patch patch does > that, > > >> and the results look a bit nicer: > >
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao wrote: > Thanks for the patch! I also created that patch, confirmed that the test > successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, > and pushed the patch. Thanks a lot! With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On 2021/01/26 17:07, Bharath Rupireddy wrote: On Tue, Jan 26, 2021 at 1:27 PM Fujii Masao wrote: Yes, so I pushed that change to stabilize the regression test. Let's keep checking how the results of buildfarm members are changed. Sorry, I'm unfamiliar with checking the system status on the build farm website - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl. I'm trying to figure that out. +WARNING: roles created by regression test cases should have names starting with "regress_" CREATE ROLE multi_conn_user2 SUPERUSER; +WARNING: roles created by regression test cases should have names starting with "regress_" Hmm... another failure happened. My bad. I should have caught that earlier. I will take care in future. Attaching a patch to fix it. Thanks for the patch! I also created that patch, confirmed that the test successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS, and pushed the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Tue, Jan 26, 2021 at 1:27 PM Fujii Masao wrote: > > Yes, so I pushed that change to stabilize the regression test. > > Let's keep checking how the results of buildfarm members are changed. Sorry, I'm unfamiliar with checking the system status on the build farm website - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl. I'm trying to figure that out. > +WARNING: roles created by regression test cases should have names starting > with "regress_" > CREATE ROLE multi_conn_user2 SUPERUSER; > +WARNING: roles created by regression test cases should have names starting > with "regress_" > > Hmm... another failure happened. My bad. I should have caught that earlier. I will take care in future. Attaching a patch to fix it. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v1-0001-Stabilize-test-case-for-postgres_fdw_disconnect_a.patch Description: Binary data