Re: Add primary keys to system catalogs
On 2021-01-18 18:23, Tom Lane wrote: The reason I got interested in this right now is the nearby discussion [1] about why findoidjoins misses some catalog relationships and whether we should fix that and/or make its results more readily accessible. I'd thought perhaps FK constraint entries could supersede the need for that tool altogether, but now it seems like not. Yes, catalog consistency checking was also something I had in mind. There are clearly a number of complications still to resolve. But in general the idea of representing relationships between tables in the DDL/schema sounds like a sensible idea. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Re: Add primary keys to system catalogs
On 2021-01-18 00:35, Robert Haas wrote: I don't have any complaint about labelling some of the unique indexes as primary keys, but I think installing foreign keys that don't really enforce anything may lead to confusion. FWIW, "not enforced" constraints (such as foreign keys) is a feature that is in the SQL standard and in other SQL implementations. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Dec 21, 2020 at 1:50 PM Hou, Zhijie wrote: > > Hi > > + > + index_oid_list = RelationGetIndexList(rel); > ... > > As memtioned in the comments of RelationGetIndexList: > * we return a copy of the list palloc'd in the caller's context. The caller > * may list_free() the returned list after scanning it. > > Shall we list_free(index_oid_list) at the end of function ? > Just to avoid potential memory leak. > I think that's a good idea, so I'll make that update in the next version of the patch. I do notice, however, that there seems to be quite a few places in the Postgres code where RelationGetIndexList() is being called without a corresponding list_free() being called - obviously instead relying on it being deallocated when the caller's memory-context is destroyed. Regards, Greg Nancarrow Fujitsu Australia
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Thu, Jan 21, 2021 at 12:17 PM Fujii Masao wrote: > On 2021/01/21 14:46, Bharath Rupireddy wrote: > > On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao > > wrote: > > > >> + if (entry->server_hashvalue == hashvalue && > + (entry->xact_depth > 0 || result)) > + { > + hash_seq_term(); > + break; > > entry->server_hashvalue can be 0? If yes, since > postgres_fdw_disconnect_all() > specifies 0 as hashvalue, ISTM that the above condition can be true > unexpectedly. Can we replace this condition with just "if (!all)"? > >>> > >>> I don't think so entry->server_hashvalue can be zero, because > >>> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0 > >>> as hash value. I have not seen someone comparing hashvalue with an > >>> expectation that it has 0 value, for instance see if (hashvalue == 0 > >>> || riinfo->oidHashValue == hashvalue). > >>> > >>>Having if(!all) something like below there doesn't suffice because we > >>> might call hash_seq_term, when some connection other than the given > >>> foreign server connection is in use. > >> > >> No because we check the following condition before reaching that code. No? > >> > >> + if ((all || entry->server_hashvalue == hashvalue) && > >> > >> > >> I was thinking that "(entry->xact_depth > 0 || result))" condition is not > >> necessary because "result" is set to true when xact_depth <= 0 and that > >> condition always indicates true. > > > > I think that condition is too confusing. How about having a boolean > > can_terminate_scan like below? > > Thanks for thinking this. But at least for me, "if (!all)" looks not so > confusing. > And the comment seems to explain why we can end the scan. May I know if it's okay to have the boolean can_terminate_scan as shown in [1]? [1] - https://www.postgresql.org/message-id/flat/CALj2ACVx0%2BiOsrAA-wXbo3RLAKqUoNvvEd7foJ0vLwOdu8XjXw%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila wrote: > > On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie wrote: > > > > Hi > > > > > > I may be wrong, and if I miss sth in previous mails, please give me some > > > hints. > > > > IMO, serial insertion with underlying parallel SELECT can be > > > > considered for foreign table or temporary table, as the insertions only > > > happened in the leader process. > > > > > > > > > > I don't think we support parallel scan for temporary tables. Can you > > > please > > > try once both of these operations without Insert being involved? If you > > > are able to produce a parallel plan without Insert then we can see why it > > > is not supported with Insert. > > > > Sorry, may be I did not express it clearly, I actually means the case when > > insert's target(not in select part) table is temporary. > > And you are right that parallel select is not enabled when temporary table > > is in select part. > > > > I think Select can be parallel for this case and we should support this case. > So I think we're saying that if the target table is a foreign table or temporary table, it can be regarded as PARALLEL_RESTRICTED, right? i.e. code-wise: /* -* We can't support table modification in parallel-mode if it's a foreign -* table/partition (no FDW API for supporting parallel access) or a +* We can't support table modification in a parallel worker if it's a +* foreign table/partition (no FDW API for supporting parallel access) or a * temporary table. */ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || RelationUsesLocalBuffers(rel)) { - table_close(rel, lockmode); - context->max_hazard = PROPARALLEL_UNSAFE; - return true; + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) + { + table_close(rel, lockmode); + return true; + } } Regards, Greg Nancarrow Fujitsu Australia
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On 2021/01/21 14:46, Bharath Rupireddy wrote: On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao wrote: > >> + if (entry->server_hashvalue == hashvalue && + (entry->xact_depth > 0 || result)) + { + hash_seq_term(); + break; entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all() specifies 0 as hashvalue, ISTM that the above condition can be true unexpectedly. Can we replace this condition with just "if (!all)"? I don't think so entry->server_hashvalue can be zero, because GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0 as hash value. I have not seen someone comparing hashvalue with an expectation that it has 0 value, for instance see if (hashvalue == 0 || riinfo->oidHashValue == hashvalue). Having if(!all) something like below there doesn't suffice because we might call hash_seq_term, when some connection other than the given foreign server connection is in use. No because we check the following condition before reaching that code. No? + if ((all || entry->server_hashvalue == hashvalue) && I was thinking that "(entry->xact_depth > 0 || result))" condition is not necessary because "result" is set to true when xact_depth <= 0 and that condition always indicates true. I think that condition is too confusing. How about having a boolean can_terminate_scan like below? Thanks for thinking this. But at least for me, "if (!all)" looks not so confusing. And the comment seems to explain why we can end the scan. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> > > Attaching v15 patch set. Please consider it for further review. > > > > Hi > > > > I have some comments for the 0001 patch > > > > In v15-0001-postgres_fdw-function-to-discard-cached-connecti > > > > 1. > > + If there is no open connection to the given foreign server, > false > > + is returned. If no foreign server with the given name is found, > > + an error > > > > Do you think it's better add some testcases about: > > call postgres_fdw_disconnect and postgres_fdw_disconnect_all > > when there is no open connection to the given foreign server > > Do you mean a test case where foreign server exists but > postgres_fdw_disconnect() returns false because there's no connection for > that server? Yes, I read this from the doc, so I think it's better to test this. > > 2. > > + /* > > +* For the given server, if we closed connection > or it is still in > > +* use, then no need of scanning the cache > further. > > +*/ > > + if (entry->server_hashvalue == hashvalue && > > + (entry->xact_depth > 0 || result)) > > + { > > + hash_seq_term(); > > + break; > > + } > > > > If I am not wrong, is the following condition always true ? > > (entry->xact_depth > 0 || result) > > It's not always true. But it seems like it's too confusing, please have > a look at the upthread suggestion to change this with can_terminate_scan > boolean. Thanks for the remind, I will look at that. Best regards, houzj
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Thu, Jan 21, 2021 at 11:15 AM Hou, Zhijie wrote: > > > Attaching v15 patch set. Please consider it for further review. > > Hi > > I have some comments for the 0001 patch > > In v15-0001-postgres_fdw-function-to-discard-cached-connecti > > 1. > + If there is no open connection to the given foreign server, > false > + is returned. If no foreign server with the given name is found, an > error > > Do you think it's better add some testcases about: > call postgres_fdw_disconnect and postgres_fdw_disconnect_all when > there is no open connection to the given foreign server Do you mean a test case where foreign server exists but postgres_fdw_disconnect() returns false because there's no connection for that server? > 2. > + /* > +* For the given server, if we closed connection or > it is still in > +* use, then no need of scanning the cache further. > +*/ > + if (entry->server_hashvalue == hashvalue && > + (entry->xact_depth > 0 || result)) > + { > + hash_seq_term(); > + break; > + } > > If I am not wrong, is the following condition always true ? > (entry->xact_depth > 0 || result) It's not always true. But it seems like it's too confusing, please have a look at the upthread suggestion to change this with can_terminate_scan boolean. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Jan 21, 2021 at 12:47 PM Hou, Zhijie wrote: > > > > Hi > > It seems there are some previous comments[1][2] not addressed in current > patch. > Just to make sure it's not missed. > > [1] > https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local > > [2] > https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com > Thanks for alerting me to those, somehow I completely missed them, sorry about that. I'll be sure to investigate and address them in the next version of the patch I post. Regards, Greg Nancarrow Fujitsu Australia
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao wrote: > >> + if (entry->server_hashvalue == hashvalue && > >> + (entry->xact_depth > 0 || result)) > >> + { > >> + hash_seq_term(); > >> + break; > >> > >> entry->server_hashvalue can be 0? If yes, since > >> postgres_fdw_disconnect_all() > >> specifies 0 as hashvalue, ISTM that the above condition can be true > >> unexpectedly. Can we replace this condition with just "if (!all)"? > > > > I don't think so entry->server_hashvalue can be zero, because > > GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0 > > as hash value. I have not seen someone comparing hashvalue with an > > expectation that it has 0 value, for instance see if (hashvalue == 0 > > || riinfo->oidHashValue == hashvalue). > > > > Having if(!all) something like below there doesn't suffice because we > > might call hash_seq_term, when some connection other than the given > > foreign server connection is in use. > > No because we check the following condition before reaching that code. No? > > + if ((all || entry->server_hashvalue == hashvalue) && > > > I was thinking that "(entry->xact_depth > 0 || result))" condition is not > necessary because "result" is set to true when xact_depth <= 0 and that > condition always indicates true. I think that condition is too confusing. How about having a boolean can_terminate_scan like below? while ((entry = (ConnCacheEntry *) hash_seq_search())) { boolcan_terminate_scan = false; /* * Either disconnect given or all the active and not in use cached * connections. */ if ((all || entry->server_hashvalue == hashvalue) && entry->conn) { /* We cannot close connection that's in use, so issue a warning. */ if (entry->xact_depth > 0) { ForeignServer *server; if (!all) can_terminate_scan = true; server = GetForeignServerExtended(entry->serverid, FSV_MISSING_OK); if (!server) { /* * If the server has been dropped in the current explicit * transaction, then this entry would have been invalidated * in pgfdw_inval_callback at the end of drop sever * command. Note that this connection would not have been * closed in pgfdw_inval_callback because it is still being * used in the current explicit transaction. So, assert * that here. */ Assert(entry->invalidated); ereport(WARNING, (errmsg("cannot close dropped server connection because it is still in use"))); } else ereport(WARNING, (errmsg("cannot close connection for server \"%s\" because it is still in use", server->servername))); } else { elog(DEBUG3, "discarding connection %p", entry->conn); disconnect_pg_server(entry); result = true; if (!all) can_terminate_scan = true; } /* * For the given server, if we closed connection or it is still in * use, then no need of scanning the cache further. */ if (can_terminate_scan) { hash_seq_term(); break; } } } With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> Attaching v15 patch set. Please consider it for further review. Hi I have some comments for the 0001 patch In v15-0001-postgres_fdw-function-to-discard-cached-connecti 1. + If there is no open connection to the given foreign server, false + is returned. If no foreign server with the given name is found, an error Do you think it's better add some testcases about: call postgres_fdw_disconnect and postgres_fdw_disconnect_all when there is no open connection to the given foreign server 2. + /* +* For the given server, if we closed connection or it is still in +* use, then no need of scanning the cache further. +*/ + if (entry->server_hashvalue == hashvalue && + (entry->xact_depth > 0 || result)) + { + hash_seq_term(); + break; + } If I am not wrong, is the following condition always true ? (entry->xact_depth > 0 || result) Best regards, houzj
Re: Support for NSS as a libpq TLS backend
On Tue, Jan 19, 2021 at 09:21:41PM +0100, Daniel Gustafsson wrote: >> In order to >> move on with this set, I would suggest to extract some parts of the >> patch set independently of the others and have two buildfarm members >> for the MSVC and non-MSVC cases to stress the parts that can be >> committed. Just seeing the size, we could move on with: >> - The ./configure set, with the change to introduce --with-ssl=openssl. >> - 0004 for strong randoms. >> - Support for cryptohashes. > > I will leave it to others to decide the feasibility of this, I'm happy to > slice > and dice the commits into smaller bits to for example separate out the > --with-ssl autoconf change into a non NSS dependent commit, if that's wanted. IMO it makes sense to extract the independent pieces and build on top of them. The bulk of the changes is likely going to have a bunch of comments if reviewed deeply, so I think that we had better remove from the stack the small-ish problems to ease the next moves. The ./configure part and replacement of with_openssl by with_ssl is mixed in 0001 and 0002, which is actually confusing. And, FWIW, I would be fine with applying a patch that introduces a --with-ssl with a compatibility kept for --with-openssl. This is what 0001 is doing, actually, similarly to the past switches for --with-uuid. A point that has been mentioned offline by you, but not mentioned on this list. The structure of the modules in src/test/ssl/ could be refactored to help with an easier integration of more SSL libraries. This makes sense taken independently. > Based on an offlist discussion I believe this was a misunderstanding, but if I > instead misunderstood that feel free to correct me with how you think this > should be done. The point would be to rename BITS_PER_BYTE to PG_BITS_PER_BYTE in the code and avoid conflicts. I am not completely sure if others would agree here, but this would remove quite some ifdef/undef stuff from the code dedicated to NSS. > > src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no? > > Good point, fixed. Please note that patch 0001 is failing to apply after the recent commit b663a41. There are conflicts in postgres_fdw.out. Also, what's the minimum version of NSS that would be supported? It would be good to define an acceptable older version, to keep that documented and to track that perhaps with some configure checks (?), similarly to what is done for OpenSSL. Patch 0006 has three trailing whitespaces (git diff --check complains). Running the regression tests of pgcrypto, I think that the SHA2 implementation is not completely right. Some SHA2 encoding reports results from already-freed data. I have spotted a second issue within scram_HMAC_init(), where pg_cryptohash_create() remains stuck inside NSS_InitContext(), freezing the regression tests where password hashed for SCRAM are created. + ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner); + ctx = MemoryContextAlloc(TopMemoryContext, sizeof(pg_cryptohash_ctx)); +#else + ctx = pg_malloc(sizeof(pg_cryptohash_ctx)); +#endif cryptohash_nss.c cannot use pg_malloc() for frontend allocations. On OOM, your patch would call exit() directly, even within libpq. But shared library callers need to know about the OOM failure. + explicit_bzero(ctx, sizeof(pg_cryptohash_ctx)); + pfree(ctx); For similar reasons, pfree should not be used for the frontend code in cryptohash_nss.c. The fallback should be just a malloc/free set. + status = PK11_DigestBegin(ctx->pk11_context); + + if (status != SECSuccess) + return 1; + return 0; This needs to return -1 on failure, not 1. I really need to study more the choide of the options chosen for NSS_InitContext()... But based on the docs I can read on the matter I think that saving nsscontext in pg_cryptohash_ctx is right for each cryptohash built. src/tools/msvc/ is missing an update for cryptohash_nss.c. -- Michael signature.asc Description: PGP signature
Re: Tid scan improvements
On Wed, 13 Jan 2021 at 15:38, Edmund Horner wrote: > Thanks for updating the patch. I'd be very happy if this got picked up > again, and I'd certainly follow along and do some review. Likewise here. I this patch was pretty close so it seems a shame to let it slip through the cracks. I spoke to Andres off-list about this patch. He mentioned that he wasn't too keen on seeing the setscanlimits being baked into the table AM API. He mentioned that he'd rather not assume too much about table AMs having all of their tids in a given range consecutively on a set of pages. That seems reasonable to me. He suggested that we add a new callback that just allows a range of ItemPointers to scan and leave it up to the implementation to decide which pages should be scanned to fetch the tuples within the given range. I didn't argue. I just went and coded it all, hopefully to Andres' description. The new table AM callback is optional. I've implemented this in the attached. I also took the time to support backwards TID Range scans and added a few more tests to test rescans. I just found it annoying that TID Scans supported backwards scans and TID Range scans did not. The 0002 patch is the guts of it. The 0001 patch is an existing bug that needs to be fixed before 0002 could go in (backwards TID Range Scans are broken without this). I've posted separately about this bug in [1] I also didn't really like the idea of adding possibly lots of bool fields to RelOptInfo to describe what the planner can do in regards to what the given table AM supports. I know that IndexOptInfo has such a set of bool fields. I'd rather not repeat that, so I just went with a single int field named "amflags" and just added a single constant to define a flag that specifies if the rel supports scanning ranges of TIDs. Edmund, will you get time to a look at this? David [1] https://www.postgresql.org/message-id/caaphdvpgc9h0_ovd2ctgbcxcs1n-qdyzsebrnuh+0cwja9c...@mail.gmail.com From 3a0469df93690889793823fdec235f72c8fb81d7 Mon Sep 17 00:00:00 2001 From: "dgrow...@gmail.com" Date: Thu, 21 Jan 2021 16:27:25 +1300 Subject: [PATCH v11 1/2] Fix hypothetical bug in heap backward scans Both heapgettup() and heapgettup_pagemode() incorrectly set the first page to scan in a backward scan in which the pages to scan was specified by heap_setscanlimits(). In theory, this could result in the incorrect pages being scanned. In practice, nowhere in core code performs backward scans after having used heap_setscanlimits(). However, it's possible an extension uses the heap functions in this way. For the bug to manifest, the scan must be limited to fewer than the number of pages in the relation and start at page 0. The scan will start on the final page in the table rather than the final page in the range of pages to scan. The correct number of pages is always scanned, it's just the pages which are scanned which can be incorrect. This is a precursor fix to a future patch which allows TID Range scans to scan a subset of a heap table. Proper adjustment of the heap scan code seems to have been missed when heap_setscanlimits() was added in 7516f5259. Author: David Rowley Discussion: https://postgr.es/m/caaphdvpgc9h0_ovd2ctgbcxcs1n-qdyzsebrnuh+0cwja9c...@mail.gmail.com --- src/backend/access/heap/heapam.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index faffbb1865..ddd214b7af 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -603,11 +603,15 @@ heapgettup(HeapScanDesc scan, * forward scanners. */ scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; - /* start from last page of the scan */ - if (scan->rs_startblock > 0) - page = scan->rs_startblock - 1; + + /* +* Start from last page of the scan. Ensure we take into account +* rs_numblocks if it's been adjusted by heap_setscanlimits(). +*/ + if (scan->rs_numblocks != InvalidBlockNumber) + page = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks; else - page = scan->rs_nblocks - 1; + page = (scan->rs_startblock + scan->rs_nblocks - 1) % scan->rs_nblocks; heapgetpage((TableScanDesc) scan, page); } else @@ -918,11 +922,15 @@ heapgettup_pagemode(HeapScanDesc scan, * forward scanners. */ scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; - /* start from last page of the scan */ - if (scan->rs_startblock > 0) -
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Jan 21, 2021 at 1:50 PM Zhihong Yu wrote: > > For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch: > > + boolisParallelModifyLeader = IsA(planstate, GatherState) && > IsA(outerPlanState(planstate), ModifyTableState); > > Please wrap long line. > OK. I thought I ran pg_indent fairly recently, but maybe it chose not to wrap that line. > + uint64 *processed_count_space; > > If I read the code correctly, it seems it can be dropped (use > pei->processed_count directly). > You're right. I'll change it in the next version. Regards, Greg Nancarrow Fujitsu Australia
Re: [HACKERS] [PATCH] Generic type subscripting
On Wed Jan 20, 2021 at 2:08 PM EST, Dmitry Dolgov wrote: > > On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote: > > > Thanks, I need to remember to not skipp doc building for testing process > > > even for such small changes. Hope now I didn't forget anything. > > > > > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > > > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > > > added hints. > > > > > > Great! I've applied almost all of it, except: > > > > > > + A jsonb value will accept assignments to nonexistent > > > subscript > > > + paths as long as the nonexistent elements being traversed are all > > > arrays. > > > > > > Maybe I've misunderstood the intention, but there is no requirement > > > about arrays for creating such an empty path. I've formulated it as: > > > > > > + A jsonb value will accept assignments to nonexistent > > > subscript > > > + paths as long as the last existing path key is an object or an array. > > > > My intention there was to highlight the difference between: > > > > * SET obj['a']['b']['c'] = '"newvalue"' > > * SET arr[0][0][3] = '"newvalue"' > > > > obj has to conform to {"a": {"b": {...}}} in order to receive the > > assignment of the nested c. If it doesn't, that's the error case we > > discussed earlier. But arr can be null, [], and so on, and any missing > > structure [[[null, null, null, "newvalue"]]] will be created. > > If arr is 'null', or any other scalar value, such subscripting will work > only one level deep because they represented internally as an array of > one element. If arr is '[]' the path will comply by definition. So it's > essentially the same as for objects with no particular difference. If > such a quirk about scalars being treated like arrays is bothering, we > could also bend it in this case as well (see the attached version). I missed that distinction in the original UPDATE paragraph too. Here's another revision based on v48. From a486ee221469037b08d3663f1ec142a905406f8b Mon Sep 17 00:00:00 2001 From: Dian M Fay Date: Wed, 20 Jan 2021 23:36:34 -0500 Subject: [PATCH] more jsonb subscripting documentation edits --- doc/src/sgml/json.sgml | 40 ++-- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index deeb9e66e0..e16dd6973d 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -616,16 +616,17 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu UPDATE statements may use subscripting in the - SET clause to modify jsonb values. Object - values being traversed must exist as specified by the subscript path. For - instance, the path val['a']['b']['c'] assumes that - val, val['a'], and val['a']['b'] - are all objects in every record being updated (val['a']['b'] - may or may not contain a field named c, as long as it's an - object). If any individual val, val['a'], - or val['a']['b'] is a non-object such as a string, a number, - or NULL, an error is raised even if other values do conform. - Array values are not subject to this restriction, as detailed below. + SET clause to modify jsonb values. Subscript + paths must be traversible for all affected values insofar as they exist. For + instance, the path val['a']['b']['c'] can be traversed all + the way to c if every val, + val['a'], and val['a']['b'] is an + object. If any val['a'] or val['a']['b'] + is not defined, it will be created as an empty object and filled as + necessary. However, if any val itself or one of the + intermediary values is defined as a non-object such as a string, number, or + jsonb null, traversal cannot proceed so + an error is raised and the transaction aborted. @@ -658,8 +659,9 @@ SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"'; jsonb assignment via subscripting handles a few edge cases differently from jsonb_set. When a source jsonb - is NULL, assignment via subscripting will proceed as if - it was an empty JSON object: + value is NULL, assignment via subscripting will proceed + as if it was an empty JSON value of the type (object or array) implied by the + subscript key: -- Where jsonb_field was NULL, it is now {"a": 1} @@ -680,17 +682,19 @@ UPDATE table_name SET jsonb_field[2] = '2'; A jsonb value will accept assignments to nonexistent subscript - paths as long as the last existing path key is an object or an array. Since - the final subscript is not traversed, it may be an object key. Nested arrays - will be created and NULL-padded according to the path until - the value can be placed appropriately. + paths as long as the last existing element to be traversed is an object or + array, as implied by the corresponding subscript (the element indicated by + the last subscript
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On 2021/01/21 12:00, Bharath Rupireddy wrote: On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao wrote: + * It checks if the cache has a connection for the given foreign server that's + * not being used within current transaction, then returns true. If the + * connection is in use, then it emits a warning and returns false. The comment also should mention the case where no open connection for the given server is found? What about rewriting this to the following? - If the cached connection for the given foreign server is found and has not been used within current transaction yet, close the connection and return true. Even when it's found, if it's already used, keep the connection, emit a warning and return false. If it's not found, return false. - Done. + * It returns true, if it closes at least one connection, otherwise false. + * + * It returns false, if the cache doesn't exit. The above second comment looks redundant. Yes. "otherwise false" means it. + if (ConnectionHash) + result = disconnect_cached_connections(0, true); Isn't it smarter to make disconnect_cached_connections() check ConnectionHash and return false if it's NULL? If we do that, we can simplify the code of postgres_fdw_disconnect() and _all(). Done. + * current transaction are disconnected. Otherwise, the unused entries with the + * given hashvalue are disconnected. In the above second comment, a singular form should be used instead? Because there must be no multiple entries with the given hashvalue. Rephrased the function comment a bit. Mentioned the _disconnect and _disconnect_all in comments because we have said enough what each of those two functions do. +/* + * Workhorse to disconnect cached connections. + * + * This function disconnects either all unused connections when called from + * postgres_fdw_disconnect_all or a given foreign server unused connection when + * called from postgres_fdw_disconnect. + * + * This function returns true if at least one connection is disconnected, + * otherwise false. + */ + server = GetForeignServer(entry->serverid); This seems to cause an error "cache lookup failed" if postgres_fdw_disconnect_all() is called when there is a connection in use but its server is dropped. To avoid this error, GetForeignServerExtended() with FSV_MISSING_OK should be used instead, like postgres_fdw_get_connections() does? +1. So, I changed it to GetForeignServerExtended, added an assertion for invalidation just like postgres_fdw_get_connections. I also added a test case for this, we now emit a slightly different warning for this case alone that is (errmsg("cannot close dropped server connection because it is still in use")));. This warning looks okay as we cannot show any other server name in the ouput and we know that this rare case can exist when someone drops the server in an explicit transaction. + if (entry->server_hashvalue == hashvalue && + (entry->xact_depth > 0 || result)) + { + hash_seq_term(); + break; entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all() specifies 0 as hashvalue, ISTM that the above condition can be true unexpectedly. Can we replace this condition with just "if (!all)"? I don't think so entry->server_hashvalue can be zero, because GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0 as hash value. I have not seen someone comparing hashvalue with an expectation that it has 0 value, for instance see if (hashvalue == 0 || riinfo->oidHashValue == hashvalue). Having if(!all) something like below there doesn't suffice because we might call hash_seq_term, when some connection other than the given foreign server connection is in use. No because we check the following condition before reaching that code. No? + if ((all || entry->server_hashvalue == hashvalue) && I was thinking that "(entry->xact_depth > 0 || result))" condition is not necessary because "result" is set to true when xact_depth <= 0 and that condition always indicates true. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Jan 21, 2021 at 1:28 PM Zhihong Yu wrote: > > Hi, > For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch : > > is found from the additional parallel-safety checks, or from the existing > parallel-safety checks for SELECT that it currently performs. > > existing and 'it currently performs' are redundant. You can omit 'that it > currently performs'. > OK, but this is very minor. > Minor. For index_expr_max_parallel_hazard_for_modify(), > > + if (keycol == 0) > + { > + /* Found an index expression */ > > You can check if keycol != 0, continue with the loop. This would save some > indent. Yes I know, but I don't really see any issue with indent (I'm using 4-space tabs). > > + if (index_expr_item == NULL)/* shouldn't happen */ > + { > + elog(WARNING, "too few entries in indexprs list"); > > I think the warning should be an error since there is assertion ahead of the > if statement. > Assertions are normally for DEBUG builds, so the Assert would have no effect in a production (release) build. Besides, as I have explained in my reply to previous feedback, the logging of a WARNING (rather than ERROR) is intentional, because I want processing to continue (not stop) if ever this very rare condition was to occur - so that the issue can be dealt with by the current non-parallel processing (rather than stop dead in the middle of parallel-safety-checking code). For a DEBUG build, it is handy for the Assert to immediately alert us to the issue (which could really only be caused by a database corruption, not bug in the code). Note that Vignesh originally suggested changing it from "elog(ERROR,...)" to "elog(WARNING,...)", and I agree with him. Regards, Greg Nancarrow Fujitsu Australia
Re: ResourceOwner refactoring
On Thu, Jan 21, 2021 at 12:11:37AM +0200, Heikki Linnakangas wrote: > Summary: In the the worst scenario, the patched version is still 24% slower > than unpatched. But many other scenarios are now faster with the patch. Is there a reason explaining the sudden drop for numsnaps within the [10,60] range? The gap looks deeper with a low numkeep. -- Michael signature.asc Description: PGP signature
Re: pg_upgrade fails with non-standard ACL
On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote: > On 03.01.2021 14:29, Noah Misch wrote: > >On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote: > Thank you for the review. > New version of the patch is attached, though I haven't tested it properly > yet. I am planning to do in a couple of days. Once that testing completes, please change the commitfest entry status to Ready for Committer. > >>+ snprintf(output_path, sizeof(output_path), "revoke_objects.sql"); > >If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database > >in > >which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened > >requires a GRANT. Can you use a file name that will still make sense if > >someone enhances pg_upgrade to generate such GRANT statements? > I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to > you? That name is fine with me. > > ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b"; > > GRANT SELECT ("a.b") ON pg_locks TO backup; > > > >After which revoke_objects.sql has: > > > > psql:./revoke_objects.sql:9: ERROR: syntax error at or near "") ON > > pg_catalog.pg_locks."" > > LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup"; > > > >While that ALTER VIEW probably should have required allow_system_table_mods, > >we need to assume such databases exist. > > I've fixed it by using pg_identify_object_as_address(). Now we don't have to > parse obj_identity. Nice.
Re: adding wait_start column to pg_locks
On 2021/01/18 12:00, torikoshia wrote: On 2021-01-15 15:23, torikoshia wrote: Thanks for your reviewing and comments! On 2021-01-14 12:39, Ian Lawrence Barwick wrote: Looking at the code, this happens as the wait start time is being recorded in the lock record itself, so always contains the value reported by the latest lock acquisition attempt. I think you are right and wait_start should not be recorded in the LOCK. On 2021-01-15 11:48, Ian Lawrence Barwick wrote: 2021年1月15日(金) 3:45 Robert Haas : On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick wrote: It looks like the logical place to store the value is in the PROCLOCK structure; ... That seems surprising, because there's one PROCLOCK for every combination of a process and a lock. But, a process can't be waiting for more than one lock at the same time, because once it starts waiting to acquire the first one, it can't do anything else, and thus can't begin waiting for a second one. So I would have thought that this would be recorded in the PROC. Umm, I think we're at cross-purposes here. The suggestion is to note the time when the process started waiting for the lock in the process's PROCLOCK, rather than in the lock itself (which in the original version of the patch resulted in all processes with an interest in the lock appearing to have been waiting to acquire it since the time a lock acquisition was most recently attempted). AFAIU, it seems possible to record wait_start in the PROCLOCK but redundant since each process can wait at most one lock. To confirm my understanding, I'm going to make another patch that records wait_start in the PGPROC. Attached a patch. I noticed previous patches left the wait_start untouched even after it acquired lock. The patch also fixed it. Any thoughts? Thanks for updating the patch! I think that this is really useful feature!! I have two minor comments. + + wait_start timestamptz The column name "wait_start" should be "waitstart" for the sake of consistency with other column names in pg_locks? pg_locks seems to avoid including an underscore in column names, so "locktype" is used instead of "lock_type", "virtualtransaction" is used instead of "virtual_transaction", etc. + Lock acquisition wait start time. NULL if + lock acquired. There seems the case where the wait start time is NULL even when "grant" is false. It's better to add note about that case into the docs? For example, I found that the wait start time is NULL while the startup process is waiting for the lock. Is this only that case? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: REINDEX backend filtering
On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier wrote: > > On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote: > > Is this really a common enough operation that we need it in the main > > grammar? > > > > Having the functionality, definitely, but what if it was "just" a > > function instead? So you'd do something like: > > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here) > > \gexec > > > > Or even a function that returns the REINDEX commands directly (taking > > a parameter to turn on/off concurrency for example). > > > > That also seems like it would be easier to make flexible, and just as > > easy to plug into reindexdb? > > Having control in the grammar to choose which index to reindex for a > table is very useful when it comes to parallel reindexing, because > it is no-brainer in terms of knowing which index to distribute to one > job or another. In short, with this grammar you can just issue a set > of REINDEX TABLE commands that we know will not conflict with each > other. You cannot get that level of control with REINDEX INDEX as it > may be possible that more or more commands conflict if they work on an > index of the same relation because it is required to take lock also on > the parent table. Of course, we could decide to implement a > redistribution logic in all frontend tools that need such things, like > reindexdb, but that's not something I think we should let the client > decide of. A backend-side filtering is IMO much simpler, less code, > and more elegant. Maybe additional filtering capabilities is not something that will be required frequently, but I'm pretty sure that reindexing only indexes that might be corrupt is something that will be required often.. So I agree, having all that logic in the backend makes everything easier for users, having the choice of the tools they want to issue the query while still having all features available. There was a conflict with a3dc926009be8 (Refactor option handling of CLUSTER, REINDEX and VACUUM), so rebased version attached. No other changes included yet. v2-0001-Add-a-new-COLLATION-option-to-REINDEX.patch Description: Binary data
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao wrote: > + * It checks if the cache has a connection for the given foreign server > that's > + * not being used within current transaction, then returns true. If the > + * connection is in use, then it emits a warning and returns false. > > The comment also should mention the case where no open connection > for the given server is found? What about rewriting this to the following? > > - > If the cached connection for the given foreign server is found and has not > been used within current transaction yet, close the connection and return > true. Even when it's found, if it's already used, keep the connection, emit > a warning and return false. If it's not found, return false. > - Done. > + * It returns true, if it closes at least one connection, otherwise false. > + * > + * It returns false, if the cache doesn't exit. > > The above second comment looks redundant. Yes. "otherwise false" means it. > + if (ConnectionHash) > + result = disconnect_cached_connections(0, true); > > Isn't it smarter to make disconnect_cached_connections() check > ConnectionHash and return false if it's NULL? If we do that, we can > simplify the code of postgres_fdw_disconnect() and _all(). Done. > + * current transaction are disconnected. Otherwise, the unused entries with > the > + * given hashvalue are disconnected. > > In the above second comment, a singular form should be used instead? > Because there must be no multiple entries with the given hashvalue. Rephrased the function comment a bit. Mentioned the _disconnect and _disconnect_all in comments because we have said enough what each of those two functions do. +/* + * Workhorse to disconnect cached connections. + * + * This function disconnects either all unused connections when called from + * postgres_fdw_disconnect_all or a given foreign server unused connection when + * called from postgres_fdw_disconnect. + * + * This function returns true if at least one connection is disconnected, + * otherwise false. + */ > + server = GetForeignServer(entry->serverid); > > This seems to cause an error "cache lookup failed" > if postgres_fdw_disconnect_all() is called when there is > a connection in use but its server is dropped. To avoid this error, > GetForeignServerExtended() with FSV_MISSING_OK should be used > instead, like postgres_fdw_get_connections() does? +1. So, I changed it to GetForeignServerExtended, added an assertion for invalidation just like postgres_fdw_get_connections. I also added a test case for this, we now emit a slightly different warning for this case alone that is (errmsg("cannot close dropped server connection because it is still in use")));. This warning looks okay as we cannot show any other server name in the ouput and we know that this rare case can exist when someone drops the server in an explicit transaction. > + if (entry->server_hashvalue == hashvalue && > + (entry->xact_depth > 0 || result)) > + { > + hash_seq_term(); > + break; > > entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all() > specifies 0 as hashvalue, ISTM that the above condition can be true > unexpectedly. Can we replace this condition with just "if (!all)"? I don't think so entry->server_hashvalue can be zero, because GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0 as hash value. I have not seen someone comparing hashvalue with an expectation that it has 0 value, for instance see if (hashvalue == 0 || riinfo->oidHashValue == hashvalue). Having if(!all) something like below there doesn't suffice because we might call hash_seq_term, when some connection other than the given foreign server connection is in use. Our intention to call hash_seq_term is only when a given server is found and either it's in use or is closed. if (!all && (entry->xact_depth > 0 || result)) { hash_seq_term(); break; } Given the above points, the existing check looks good to me. > +-- Closes loopback connection, returns true and issues a warning as loopback2 > +-- connection is still in use and can not be closed. > +SELECT * FROM postgres_fdw_disconnect_all(); > +WARNING: cannot close connection for server "loopback2" because it is still > in use > + postgres_fdw_disconnect_all > +- > + t > +(1 row) > > After the above test, isn't it better to call postgres_fdw_get_connections() > to check that loopback is not output? +1. > +WARNING: cannot close connection for server "loopback" because it is still > in use > +WARNING: cannot close connection for server "loopback2" because it is still > in use > > Just in the case please let me confirm that the order of these warning > messages is
Re: Parallel INSERT (INTO ... SELECT ...)
Hi, For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch: + boolisParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate), ModifyTableState); Please wrap long line. + uint64 *processed_count_space; If I read the code correctly, it seems it can be dropped (use pei->processed_count directly). Cheers On Wed, Jan 20, 2021 at 6:29 PM Zhihong Yu wrote: > Hi, > For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch : > > is found from the additional parallel-safety checks, or from the existing > parallel-safety checks for SELECT that it currently performs. > > existing and 'it currently performs' are redundant. You can omit 'that it > currently performs'. > > Minor. For index_expr_max_parallel_hazard_for_modify(), > > + if (keycol == 0) > + { > + /* Found an index expression */ > > You can check if keycol != 0, continue with the loop. This would save some > indent. > > + if (index_expr_item == NULL)/* shouldn't happen */ > + { > + elog(WARNING, "too few entries in indexprs list"); > > I think the warning should be an error since there is assertion ahead of > the if statement. > > + Assert(!isnull); > + if (isnull) > + { > + /* > +* This shouldn't ever happen, but if it does, log a > WARNING > +* and return UNSAFE, rather than erroring out. > +*/ > + elog(WARNING, "null conbin for constraint %u", con->oid); > > The above should be error as well. > > Cheers > > On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow > wrote: > >> Thanks for the feedback. >> Posting an updated set of patches. Changes are based on feedback, as >> detailed below: >> >> There's a couple of potential issues currently being looked at: >> - locking issues in additional parallel-safety checks? >> - apparent uneven work distribution across the parallel workers, for >> large insert data >> >> >> [Antonin] >> - Fixed bad Assert in PrepareParallelMode() >> - Added missing comment to explain use of GetCurrentCommandId() in >> PrepareParallelMode() >> - Some variable name shortening in a few places >> - Created common function for creation of non-parallel and parallel >> ModifyTable paths >> - Path count variable changed to bool >> - Added FIXME comment to dubious code for creating Gather target-list >> from ModifyTable subplan >> - Fixed check on returningLists to use NIL instead of NULL >> >> [Amit] >> - Moved additional parallel-safety checks (for modify case) into >> max_parallel_hazard() >> - Removed redundant calls to max_parallel_hazard_test() >> - Added Asserts to "should never happen" null-attribute cases (and >> added WARNING log missing from one case) >> - Added comment for use of NoLock in max_parallel_hazard_for_modify() >> >> [Vignesh] >> - Fixed a couple of typos >> - Added a couple of test cases for testing that the same transaction >> is used by all parallel workers >> >> >> Regards, >> Greg Nancarrow >> Fujitsu Australia >> >
Re: [PATCH 1/1] Initial mach based shared memory support.
On Sun, Nov 22, 2020 at 9:19 PM James Hilliard wrote: > In order to avoid hitting these limits we can bypass the wrapper layer > and just use mach directly. FWIW I looked into using mach_vm_alllocate() years ago because I wanted to be able to use its VM_FLAGS_SUPERPAGE_SIZE_2MB flag to implement PostgreSQL's huge_pages option for the main shared memory area, but I ran into some difficulty getting such mapping to be inherited by fork() children. There may be some way to get past that, but it seems the current crop of Apple Silicon has only 4KB and 16KB pages and I don't know if that's interesting enough. On the other hand, I just saw a claim that "running an arm64 Debian VM on Apple M1, using a 16K page size instead of 4K reduces this kernel build time by *16%*". [1] https://twitter.com/AtTheHackOfDawn/status/1333895115174187011
Re: POC: postgres_fdw insert batching
On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote: From: Tomas Vondra Right, that's pretty much what I ended up doing (without the CMD_INSERT check it'd add batching info to explain for updates too, for example). I'll do a bit more testing on the attached patch, but I think that's the right fix to push. Thanks to the outer check for operation == CMD_INSERT, the inner one became unnecessary. Right. I've pushed the fix, hopefully buildfarm will get happy again. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Parallel INSERT (INTO ... SELECT ...)
Hi, For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch : is found from the additional parallel-safety checks, or from the existing parallel-safety checks for SELECT that it currently performs. existing and 'it currently performs' are redundant. You can omit 'that it currently performs'. Minor. For index_expr_max_parallel_hazard_for_modify(), + if (keycol == 0) + { + /* Found an index expression */ You can check if keycol != 0, continue with the loop. This would save some indent. + if (index_expr_item == NULL)/* shouldn't happen */ + { + elog(WARNING, "too few entries in indexprs list"); I think the warning should be an error since there is assertion ahead of the if statement. + Assert(!isnull); + if (isnull) + { + /* +* This shouldn't ever happen, but if it does, log a WARNING +* and return UNSAFE, rather than erroring out. +*/ + elog(WARNING, "null conbin for constraint %u", con->oid); The above should be error as well. Cheers On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow wrote: > Thanks for the feedback. > Posting an updated set of patches. Changes are based on feedback, as > detailed below: > > There's a couple of potential issues currently being looked at: > - locking issues in additional parallel-safety checks? > - apparent uneven work distribution across the parallel workers, for > large insert data > > > [Antonin] > - Fixed bad Assert in PrepareParallelMode() > - Added missing comment to explain use of GetCurrentCommandId() in > PrepareParallelMode() > - Some variable name shortening in a few places > - Created common function for creation of non-parallel and parallel > ModifyTable paths > - Path count variable changed to bool > - Added FIXME comment to dubious code for creating Gather target-list > from ModifyTable subplan > - Fixed check on returningLists to use NIL instead of NULL > > [Amit] > - Moved additional parallel-safety checks (for modify case) into > max_parallel_hazard() > - Removed redundant calls to max_parallel_hazard_test() > - Added Asserts to "should never happen" null-attribute cases (and > added WARNING log missing from one case) > - Added comment for use of NoLock in max_parallel_hazard_for_modify() > > [Vignesh] > - Fixed a couple of typos > - Added a couple of test cases for testing that the same transaction > is used by all parallel workers > > > Regards, > Greg Nancarrow > Fujitsu Australia >
Re: Printing backtrace of postgres processes
On Thu, 21 Jan 2021 at 09:56, Tom Lane wrote: > Craig Ringer writes: > > On Wed, 20 Jan 2021 at 05:23, Tom Lane wrote: > >> BTW, it also looks like the patch is doing nothing to prevent the > >> backtrace from being sent to the connected client. > > > I don't see a good reason to send a bt to a client. Even though these > > backtraces won't be analysing debuginfo and populating args, locals, etc, > > it should still just go to the server log. > > Yeah. That's easier than I was thinking, we just need to > s/LOG/LOG_SERVER_ONLY/. > > >> Maybe, given all of these things, we should forget using elog at > >> all and just emit the trace with fprintf(stderr). > > > That causes quite a lot of pain with MemoryContextStats() already > > True. Given the changes discussed in the last couple messages, I don't > see any really killer reasons why we can't ship the trace through elog. > We can always try that first, and back off to fprintf if we do find > reasons why it's too unstable. > Yep, works for me. Thanks for being open to considering this. I know lots of this stuff can seem like a pointless sidetrack because the utility of it is not obvious on dev systems or when you're doing your own hands-on expert support on systems you own and operate yourself. These sorts of things really only start to make sense when you're touching many different postgres systems "in the wild" - such as commercial support services, helping people on -general, -bugs or stackoverflow, etc. I really appreciate your help with it.
RE: POC: postgres_fdw insert batching
From: Tom Lane > The "single target table" could be partitioned, in which case there'll be > multiple resultrelinfos, some of which could be foreign tables. Thank you. I thought so at first, but later I found that ExecInsert() only handles one element in mtstate->resultRelInfo. So I thought just the first element is processed in INSERT case. I understood (guessed) the for loop is for UPDATE and DELETE. EXPLAIN without ANALYZE UPDATE/DELETE on a partitioned table shows partitions, which would be mtstate->resultRelInfo. EXPLAIN on INSERT doesn't show partitions, so I think INSERT will find relevant partitions based on input rows during execution. Regards Takayuki Tsunakawa
RE: ResourceOwner refactoring
Dear Heikki, I tested in the same situation, and I confirmed that almost same results are returned. (results are at the end of the e-mail) You think that these results are acceptable because resowners own many resources(more than 64) in general and it's mainly faster in such a situation, isn't it? I cannot distinguish correctly, but sounds good. test result unpatched numkeep | numsnaps | lifo_time_ns | fifo_time_ns -+--+--+-- 0 |1 | 68.5 | 69.9 0 |5 | 73.2 | 76.8 0 | 10 | 70.6 | 74.7 0 | 60 | 68.0 | 75.6 0 | 70 | 91.3 | 94.8 0 | 100 | 89.0 | 89.1 0 | 1000 | 97.9 | 98.9 0 |1 |116.0 |115.9 9 | 10 | 74.7 | 76.6 9 | 100 | 80.8 | 80.1 9 | 1000 | 86.0 | 86.2 9 |1 |116.1 |116.8 65 | 70 | 84.7 | 85.3 65 | 100 | 80.5 | 80.3 65 | 1000 | 86.3 | 86.2 65 |1 |115.4 |115.9 (16 rows) patched numkeep | numsnaps | lifo_time_ns | fifo_time_ns -+--+--+-- 0 |1 | 62.4 | 62.6 0 |5 | 68.0 | 66.9 0 | 10 | 73.6 | 78.1 0 | 60 | 82.3 | 87.2 0 | 70 | 83.0 | 89.1 0 | 100 | 82.8 | 87.9 0 | 1000 | 88.2 | 96.6 0 |1 |119.6 |124.5 9 | 10 | 62.0 | 62.8 9 | 100 | 75.3 | 78.0 9 | 1000 | 82.6 | 89.3 9 |1 |116.6 |122.6 65 | 70 | 66.7 | 66.4 65 | 100 | 74.6 | 77.2 65 | 1000 | 82.1 | 88.2 65 |1 |118.0 |124.1 (16 rows) Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Add docs stub for recovery.conf
On Wed, 20 Jan 2021 at 02:45, Stephen Frost wrote: > Greetings, > > * Stephen Frost (sfr...@snowman.net) wrote: > > * Craig Ringer (craig.rin...@enterprisedb.com) wrote: > > > On Thu, 14 Jan 2021 at 03:44, Stephen Frost > wrote: > > > > Alright, how does this look? The new entries are all under the > > > > 'obsolete' section to keep it out of the main line, but should work > to > > > > 'fix' the links that currently 404 and provide a bit of a 'softer' > > > > landing for the other cases that currently just forcibly redirect > using > > > > the website doc alias capability. > > > > > > Thanks for expanding the change to other high profile obsoleted or > renamed > > > features and tools. > > > > Thanks for taking the time to review it and comment on it! > > > > > One minor point. I'm not sure this is quite the best way to spell the > index > > > entries: > > > > > > + > > > + obsolete > > > + pg_receivexlog > > > + > > > > > > as it will produce an index term "obsolete" with a list of various > > > components under it. While that concentrates them nicely, it means > people > > > won't actually find them if they're using the index alphabetically. > > > > Ah, yeah, that's definitely a good point and one that I hadn't really > > spent much time thinking about. > > > > > I'd slightly prefer > > > > > > + > > > + pg_receivexlog > > > + pg_receivewal > > > + > > > > > > even though that bulks the index up a little, because then people are > a bit > > > more likely to find it. > > > > Yup, makes sense, updated patch attached which makes that change. > > > > > > I ended up not actually doing this for the catalog -> view change of > > > > pg_replication_slots simply because I don't really think folks will > > > > misunderstand or be confused by that redirect since it's still the > same > > > > relation. If others disagree though, we could certainly change that > > > > too. > > > > > > I agree with you. > > > > Ok, great. > > > > How does the attached look then? > Pretty good to me. Thanks so much for your help and support with this. Index entries render as e.g. pg_xlogdump, The pg_xlogdump command (see also pg_waldump) wheras with the obsolete subhead they would render as something like: obsolete, Obsolete or renamed features, settings and files pg_xlogdump, The pg_xlogdump command The see also spelling is much easier to find in the index but doesn't make it as obvious that it's obsoleted/replaced. A look at the doxygen docs suggest we should use not for these. A quick sed -i -e 's///g' -e 's/<\/seealso>/<\/see>/g' doc/src/sgml/appendix-obsolete* causes them to render much better: pg_receivexlog, The pg_receivexlog command (see pg_receivewal) It might be worth changing the s too, so I've done so in the attached. The terms now render as: pg_receivexlog, pg_receivexlog renamed to pg_recievewal (see pg_receivewal) which is good enough in my opinion. The duplication is messy but an expected artifact of index generation. I don't see any docbook attribute that lets you suppress insertion of the of the section containing the , and it's not worth fiddling to try to eliminate it with structural hacks. The attached changes the titles, changes to , and also updates the comments in the obsolete entries SGML docs to specify that the id must be unchanged + give a recommended index term format. From c11e0f079c07c669abac0f00ae3f1ebdfc18eae7 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Thu, 21 Jan 2021 10:01:29 +0800 Subject: [PATCH v3] Add a docs section for obsoleted and renamed functions and settings The new appendix groups information on renamed or removed settings, commands, etc into an out-of-the-way part of the docs. The original id elements are retained in each subsection to ensure that the same filenames are produced for HTML docs. This prevents /current/ links on the web from breaking, and allows users of the web docs to follow links from old version pages to info on the changes in the new version. Prior to this change, a link to /current/ for renamed sections like the recovery.conf docs would just 404. Similarly if someone searched for recovery.conf they would find the pg11 docs, but there would be no /12/ or /current/ link, so they couldn't easily find out that it was removed in pg12 or how to adapt. Index entries are also added so that there's a breadcrumb trail for users to follow when they know the old name, but not what we changed it to. So a user who is trying to find out how to set standby_mode in PostgreSQL 12+, or where pg_resetxlog went, now has more chance of finding that information. Craig Ringer and Stephen Frost --- .../sgml/appendix-obsolete-pgreceivexlog.sgml | 24 .../sgml/appendix-obsolete-pgresetxlog.sgml | 24 .../sgml/appendix-obsolete-pgxlogdump.sgml| 24 .../appendix-obsolete-recovery-config.sgml| 58 +++ doc/src/sgml/appendix-obsolete.sgml
Re: [PoC] Non-volatile WAL buffer
On Thu, Jan 7, 2021 at 2:16 AM Tomas Vondra wrote: > > Hi, > > I think I've managed to get the 0002 patch [1] rebased to master and > working (with help from Masahiko Sawada). It's not clear to me how it > could have worked as submitted - my theory is that an incomplete patch > was submitted by mistake, or something like that. > > Unfortunately, the benchmark results were kinda disappointing. For a > pgbench on scale 500 (fits into shared buffers), an average of three > 5-minute runs looks like this: > >branch 116326496 > >master 7291 87704165310150437224186 >ntt 7912106095213206212410237819 >simple-no-buffers 7654 96544115416 95828103065 > > NTT refers to the patch from September 10, pre-allocating a large WAL > file on PMEM, and simple-no-buffers is the simpler patch simply removing > the WAL buffers and writing directly to a mmap-ed WAL segment on PMEM. > > Note: The patch is just replacing the old implementation with mmap. > That's good enough for experiments like this, but we probably want to > keep the old one for setups without PMEM. But it's good enough for > testing, benchmarking etc. > > Unfortunately, the results for this simple approach are pretty bad. Not > only compared to the "ntt" patch, but even to master. I'm not entirely > sure what's the root cause, but I have a couple hypotheses: > > 1) bug in the patch - That's clearly a possibility, although I've tried > tried to eliminate this possibility. > > 2) PMEM is slower than DRAM - From what I know, PMEM is much faster than > NVMe storage, but still much slower than DRAM (both in terms of latency > and bandwidth, see [2] for some data). It's not terrible, but the > latency is maybe 2-3x higher - not a huge difference, but may matter for > WAL buffers? > > 3) PMEM does not handle parallel writes well - If you look at [2], > Figure 4(b), you'll see that the throughput actually *drops" as the > number of threads increase. That's pretty strange / annoying, because > that's how we write into WAL buffers - each thread writes it's own data, > so parallelism is not something we can get rid of. > > I've added some simple profiling, to measure number of calls / time for > each operation (use -DXLOG_DEBUG_STATS to enable). It accumulates data > for each backend, and logs the counts every 1M ops. > > Typical stats from a concurrent run looks like this: > >xlog stats cnt 4300 > map cnt 100 time 5448333 unmap cnt 100 time 3730963 > memcpy cnt 985964 time 1550442272 len 15150499 > memset cnt 0 time 0 len 0 > persist cnt 13836 time 10369617 len 16292182 > > The times are in nanoseconds, so this says the backend did 100 mmap and > unmap calls, taking ~10ms in total. There were ~14k pmem_persist calls, > taking 10ms in total. And the most time (~1.5s) was used by pmem_memcpy > copying about 15MB of data. That's quite a lot :-( It might also be interesting if we can see how much time spent on each logging function, such as XLogInsert(), XLogWrite(), and XLogFlush(). > > My conclusion from this is that eliminating WAL buffers and writing WAL > directly to PMEM (by memcpy to mmap-ed WAL segments) is probably not the > right approach. > > I suppose we should keep WAL buffers, and then just write the data to > mmap-ed WAL segments on PMEM. Which I think is what the NTT patch does, > except that it allocates one huge file on PMEM and writes to that > (instead of the traditional WAL segments). > > So I decided to try how it'd work with writing to regular WAL segments, > mmap-ed ad hoc. The pmem-with-wal-buffers-master.patch patch does that, > and the results look a bit nicer: > >branch 116326496 > >master 7291 87704165310150437224186 >ntt 7912106095213206212410237819 >simple-no-buffers 7654 96544115416 95828103065 >with-wal-buffers7477 95454181702140167214715 > > So, much better than the version without WAL buffers, somewhat better > than master (except for 64/96 clients), but still not as good as NTT. > > At this point I was wondering how could the NTT patch be faster when > it's doing roughly the same thing. I'm sire there are some differences, > but it seemed strange. The main difference seems to be that it only maps > one large file, and only once. OTOH the alternative "simple" patch maps > segments one by one, in each backend. Per the debug stats the map/unmap > calls are fairly cheap, but maybe it interferes with the memcpy somehow. > While looking at the two methods: NTT and simple-no-buffer, I realized that in XLogFlush(), NTT patch flushes (by pmem_flush() and pmem_drain()) WAL
RE: POC: postgres_fdw insert batching
From: Tomas Vondra > Right, that's pretty much what I ended up doing (without the CMD_INSERT > check it'd add batching info to explain for updates too, for example). > I'll do a bit more testing on the attached patch, but I think that's the > right fix to > push. Thanks to the outer check for operation == CMD_INSERT, the inner one became unnecessary. Regards Takayuki Tsunakawa
Re: POC: postgres_fdw insert batching
"tsunakawa.ta...@fujitsu.com" writes: > Just for learning, could anyone tell me what this loop for? I thought > current Postgres's DML supports a single target table, so it's enough to > handle the first element of mtstate->resultRelInfo. The "single target table" could be partitioned, in which case there'll be multiple resultrelinfos, some of which could be foreign tables. regards, tom lane
Re: POC: postgres_fdw insert batching
On 1/21/21 2:53 AM, Amit Langote wrote: On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra wrote: On 1/21/21 2:24 AM, Amit Langote wrote: On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra wrote: On 1/21/21 1:17 AM, Zhihong Yu wrote: Hi, The assignment to resultRelInfo is done when junk_filter_needed is true: if (junk_filter_needed) { resultRelInfo = mtstate->resultRelInfo; Should the code for determining batch size access mtstate->resultRelInfo directly ? IMO the issue is that code iterates over all plans and moves to the next for each one: resultRelInfo++; so it ends up pointing past the last element, hence the failures. So yeah, either the code needs to move before the loop (per my patch), or we need to access mtstate->resultRelInfo directly. Accessing mtstate->resultRelInfo directly would do. The only constraint on where this block should be placed is that ri_projectReturning must be valid as of calling GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread. So, after this block in ExecInitModifyTable: /* * Initialize RETURNING projections if needed. */ if (node->returningLists) { /* * Build a projection for each result rel. */ resultRelInfo = mtstate->resultRelInfo; foreach(l, node->returningLists) { List *rlist = (List *) lfirst(l); resultRelInfo->ri_returningList = rlist; resultRelInfo->ri_projectReturning = ExecBuildProjectionInfo(rlist, econtext, slot, >ps, resultRelInfo->ri_RelationDesc->rd_att); resultRelInfo++; } } Right. But I think Tom is right this should initialize ri_BatchSize for all the resultRelInfo elements, not just the first one. Per the attached patch, which resolves the issue both on x86_64 and armv7l for me. +1 in general. To avoid looping uselessly in the case of UPDATE/DELETE where batching can't be used today, I'd suggest putting if (operation == CMD_INSERT) around the loop. Right, that's pretty much what I ended up doing (without the CMD_INSERT check it'd add batching info to explain for updates too, for example). I'll do a bit more testing on the attached patch, but I think that's the right fix to push. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 9c36860704..a4870d621a 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -2797,18 +2797,30 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) * Determine if the FDW supports batch insert and determine the batch * size (a FDW may support batching, but it may be disabled for the * server/table). + * + * We only do this for INSERT, so that for UPDATE/DELETE the value + * remains set to 0. */ - if (!resultRelInfo->ri_usesFdwDirectModify && - operation == CMD_INSERT && - resultRelInfo->ri_FdwRoutine != NULL && - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && - resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) - resultRelInfo->ri_BatchSize = - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); - else - resultRelInfo->ri_BatchSize = 1; + if (operation == CMD_INSERT) + { + resultRelInfo = mtstate->resultRelInfo; + for (i = 0; i < nplans; i++) + { + if (!resultRelInfo->ri_usesFdwDirectModify && +operation == CMD_INSERT && +resultRelInfo->ri_FdwRoutine != NULL && +resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && +resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) +resultRelInfo->ri_BatchSize = + resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); + else +resultRelInfo->ri_BatchSize = 1; + + Assert(resultRelInfo->ri_BatchSize >= 1); - Assert(resultRelInfo->ri_BatchSize >= 1); + resultRelInfo++; + } + } /* * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
RE: POC: postgres_fdw insert batching
From: Zhihong Yu > My first name is Zhihong. > You can call me Ted if you want to save some typing :-) Ah, I'm very sorry. Thank you, let me call you Ted then. That can't be mistaken. Regards Takayuki Tsunakawa
Re: Printing backtrace of postgres processes
Craig Ringer writes: > On Wed, 20 Jan 2021 at 05:23, Tom Lane wrote: >> BTW, it also looks like the patch is doing nothing to prevent the >> backtrace from being sent to the connected client. > I don't see a good reason to send a bt to a client. Even though these > backtraces won't be analysing debuginfo and populating args, locals, etc, > it should still just go to the server log. Yeah. That's easier than I was thinking, we just need to s/LOG/LOG_SERVER_ONLY/. >> Maybe, given all of these things, we should forget using elog at >> all and just emit the trace with fprintf(stderr). > That causes quite a lot of pain with MemoryContextStats() already True. Given the changes discussed in the last couple messages, I don't see any really killer reasons why we can't ship the trace through elog. We can always try that first, and back off to fprintf if we do find reasons why it's too unstable. regards, tom lane
Re: POC: postgres_fdw insert batching
On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra wrote: > On 1/21/21 2:24 AM, Amit Langote wrote: > > On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra > > wrote: > >> On 1/21/21 1:17 AM, Zhihong Yu wrote: > >>> Hi, > >>> The assignment to resultRelInfo is done when junk_filter_needed is true: > >>> > >>> if (junk_filter_needed) > >>> { > >>> resultRelInfo = mtstate->resultRelInfo; > >>> > >>> Should the code for determining batch size access mtstate->resultRelInfo > >>> directly ? > >>> > >> > >> IMO the issue is that code iterates over all plans and moves to the next > >> for each one: > >> > >> resultRelInfo++; > >> > >> so it ends up pointing past the last element, hence the failures. So > >> yeah, either the code needs to move before the loop (per my patch), or > >> we need to access mtstate->resultRelInfo directly. > > > > Accessing mtstate->resultRelInfo directly would do. The only > > constraint on where this block should be placed is that > > ri_projectReturning must be valid as of calling > > GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread. > > So, after this block in ExecInitModifyTable: > > > > /* > > * Initialize RETURNING projections if needed. > > */ > > if (node->returningLists) > > { > > > > /* > > * Build a projection for each result rel. > > */ > > resultRelInfo = mtstate->resultRelInfo; > > foreach(l, node->returningLists) > > { > > List *rlist = (List *) lfirst(l); > > > > resultRelInfo->ri_returningList = rlist; > > resultRelInfo->ri_projectReturning = > > ExecBuildProjectionInfo(rlist, econtext, slot, > > >ps, > > > > resultRelInfo->ri_RelationDesc->rd_att); > > resultRelInfo++; > > } > > } > > > > Right. But I think Tom is right this should initialize ri_BatchSize for > all the resultRelInfo elements, not just the first one. Per the attached > patch, which resolves the issue both on x86_64 and armv7l for me. +1 in general. To avoid looping uselessly in the case of UPDATE/DELETE where batching can't be used today, I'd suggest putting if (operation == CMD_INSERT) around the loop. -- Amit Langote EDB: http://www.enterprisedb.com
RE: POC: postgres_fdw insert batching
From: Tomas Vondra > Right. But I think Tom is right this should initialize ri_BatchSize for all > the > resultRelInfo elements, not just the first one. Per the attached patch, which > resolves the issue both on x86_64 and armv7l for me. I think Your patch is perfect in the sense that it's ready for the future multi-target DML support. +1 Just for learning, could anyone tell me what this loop for? I thought current Postgres's DML supports a single target table, so it's enough to handle the first element of mtstate->resultRelInfo. In that sense, Amit-san and I agreed that we don't put the if block in the for loop yet. Regards Takayuki Tsunakawa
Re: POC: postgres_fdw insert batching
Hi, Takayuki-san: My first name is Zhihong. You can call me Ted if you want to save some typing :-) Cheers On Wed, Jan 20, 2021 at 5:37 PM tsunakawa.ta...@fujitsu.com < tsunakawa.ta...@fujitsu.com> wrote: > From: Zhihong Yu > > > Do we need to consider how this part of code inside > ExecInitModifyTable() would evolve ? > > > > > I think placing the compound condition toward the end of > ExecInitModifyTable() is reasonable because it checks the latest > information. > > > > +1 for Zaihong-san's idea. But instead of rewriting every relsultRelInfo > to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to > suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately > before the if block. > > > > Thanks a lot, all for helping to solve the problem quickly! > > > > > > Regards > > Takayuki Tsunakawa > > > >
RE: Parallel INSERT (INTO ... SELECT ...)
> > Thanks for the feedback. > Posting an updated set of patches. Changes are based on feedback, as detailed > below: Hi It seems there are some previous comments[1][2] not addressed in current patch. Just to make sure it's not missed. [1] https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local [2] https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com Best regards, houzj
Re: POC: postgres_fdw insert batching
On 1/21/21 2:22 AM, Tom Lane wrote: Tomas Vondra writes: I may be wrong, but the most likely explanation seems to be this is due to the junk filter initialization, which simply moves past the end of the mtstate->resultRelInfo array. resultRelInfo is certainly pointing at garbage at that point. Yup. It's pretty amazing the x86 machines seem to be mostly OK with it. It kinda seems the GetForeignModifyBatchSize call should happen before that block. The attached patch fixes this for me (i.e. regression tests pass with no valgrind reports. Or did I get that wrong? Don't we need to initialize ri_BatchSize for *each* resultrelinfo, not merely the first one? That is, this new code needs to be somewhere inside a loop over the result rels. Yeah, I think you're right. That's an embarrassing oversight :-( regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Printing backtrace of postgres processes
On Wed, 20 Jan 2021 at 05:23, Tom Lane wrote: > > Recursion is scary, but it should (I think) not be possible if this > is driven off CHECK_FOR_INTERRUPTS. There will certainly be none > of those in libbacktrace. > We can also hold interrupts for the call, and it might be wise to do so. One point here is that it might be a good idea to suppress elog.c's > calls to functions in the error context stack. As we saw in another > connection recently, allowing that to happen makes for a *very* > large increase in the footprint of code that you are expecting to > work at any random CHECK_FOR_INTERRUPTS call site. > I strongly agree. Treat it as errhidecontext(). BTW, it also looks like the patch is doing nothing to prevent the > backtrace from being sent to the connected client. I'm not sure > what I think about whether it'd be okay from a security standpoint > to do that on the connection that requested the trace, but I sure > as heck don't want it to happen on connections that didn't. I don't see a good reason to send a bt to a client. Even though these backtraces won't be analysing debuginfo and populating args, locals, etc, it should still just go to the server log. > Maybe, given all of these things, we should forget using elog at > all and just emit the trace with fprintf(stderr). That causes quite a lot of pain with MemoryContextStats() already as it's frequently difficult to actually locate the output given the variations that exist in customer logging configurations. Sometimes stderr goes to a separate file or to journald. It's also much harder to locate the desired output since there's no log_line_prefix. I have a WIP patch floating around somewhere that tries to teach MemoryContextStats to write to the ereport channel when not called during an actual out-of-memory situation for that reason; an early version is somewhere in the archives. This is one of those "ok in development, painful in production" situations. So I'm not a big fan of pushing it to stderr, though I'd rather have that than not have the ability at all.
Re: POC: postgres_fdw insert batching
On 1/21/21 2:24 AM, Amit Langote wrote: On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra wrote: On 1/21/21 1:17 AM, Zhihong Yu wrote: Hi, The assignment to resultRelInfo is done when junk_filter_needed is true: if (junk_filter_needed) { resultRelInfo = mtstate->resultRelInfo; Should the code for determining batch size access mtstate->resultRelInfo directly ? IMO the issue is that code iterates over all plans and moves to the next for each one: resultRelInfo++; so it ends up pointing past the last element, hence the failures. So yeah, either the code needs to move before the loop (per my patch), or we need to access mtstate->resultRelInfo directly. Accessing mtstate->resultRelInfo directly would do. The only constraint on where this block should be placed is that ri_projectReturning must be valid as of calling GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread. So, after this block in ExecInitModifyTable: /* * Initialize RETURNING projections if needed. */ if (node->returningLists) { /* * Build a projection for each result rel. */ resultRelInfo = mtstate->resultRelInfo; foreach(l, node->returningLists) { List *rlist = (List *) lfirst(l); resultRelInfo->ri_returningList = rlist; resultRelInfo->ri_projectReturning = ExecBuildProjectionInfo(rlist, econtext, slot, >ps, resultRelInfo->ri_RelationDesc->rd_att); resultRelInfo++; } } Right. But I think Tom is right this should initialize ri_BatchSize for all the resultRelInfo elements, not just the first one. Per the attached patch, which resolves the issue both on x86_64 and armv7l for me. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 9c36860704..10febcae8a 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -2798,17 +2798,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) * size (a FDW may support batching, but it may be disabled for the * server/table). */ - if (!resultRelInfo->ri_usesFdwDirectModify && - operation == CMD_INSERT && - resultRelInfo->ri_FdwRoutine != NULL && - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && - resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) - resultRelInfo->ri_BatchSize = - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); - else - resultRelInfo->ri_BatchSize = 1; + resultRelInfo = mtstate->resultRelInfo; + for (i = 0; i < nplans; i++) + { + if (!resultRelInfo->ri_usesFdwDirectModify && + operation == CMD_INSERT && + resultRelInfo->ri_FdwRoutine != NULL && + resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && + resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) + resultRelInfo->ri_BatchSize = +resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); + else + resultRelInfo->ri_BatchSize = 1; + + Assert(resultRelInfo->ri_BatchSize >= 1); - Assert(resultRelInfo->ri_BatchSize >= 1); + resultRelInfo++; + } /* * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On Wed, Jan 20, 2021 at 03:34:39PM -0300, Alvaro Herrera wrote: > On 2021-Jan-20, Alexey Kondratov wrote: >> Ugh, forgot to attach the patches. Here they are. > > Yeah, looks reasonable. Patch 0002 still has a whole set of issues as I pointed out a couple of hours ago, but if we agree on 0001 as being useful if done independently, I'd rather get that done first. This way or just merging both things in a single commit is not a big deal seeing the amount of code, so I am fine with any approach. It may be possible that 0001 requires more changes depending on the work to-be-done for 0002 though? >> +/* No work if no change in tablespace. */ >> +oldTablespaceOid = rd_rel->reltablespace; >> +if (tablespaceOid != oldTablespaceOid || >> +(tablespaceOid == MyDatabaseTableSpace && >> OidIsValid(oldTablespaceOid))) >> +{ >> +/* Update the pg_class row. */ >> +rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) >> ? >> +InvalidOid : tablespaceOid; >> +CatalogTupleUpdate(pg_class, >t_self, tuple); >> + >> +changed = true; >> +} >> + >> +if (changed) >> +/* Record dependency on tablespace */ >> +changeDependencyOnTablespace(RelationRelationId, >> + >> reloid, rd_rel->reltablespace); > > Why have a separate "if (changed)" block here instead of merging with > the above? Yep. + if (SetRelTablespace(reloid, newTableSpace)) + /* Make sure the reltablespace change is visible */ + CommandCounterIncrement(); At quick glance, I am wondering why you just don't do a CCI within SetRelTablespace(). -- Michael signature.asc Description: PGP signature
RE: POC: postgres_fdw insert batching
From: Zhihong Yu > Do we need to consider how this part of code inside ExecInitModifyTable() > would evolve ? > I think placing the compound condition toward the end of > ExecInitModifyTable() is reasonable because it checks the latest information. +1 for Zaihong-san's idea. But instead of rewriting every relsultRelInfo to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately before the if block. Thanks a lot, all for helping to solve the problem quickly! Regards Takayuki Tsunakawa
Re: Printing backtrace of postgres processes
On Wed, 20 Jan 2021 at 01:31, Robert Haas wrote: > On Sat, Jan 16, 2021 at 3:21 PM Tom Lane wrote: > > I'd argue that backtraces for those processes aren't really essential, > > and indeed that trying to make the syslogger report its own backtrace > > is damn dangerous. > > I agree. Ideally I'd like to be able to use the same mechanism > everywhere and include those processes too, but surely regular > backends and parallel workers are going to be the things that come up > most often. > > > (Personally, I think this whole patch fails the safety-vs-usefulness > > tradeoff, but I expect I'll get shouted down.) > > You and I are frequently on opposite sides of these kinds of > questions, but I think this is a closer call than many cases. I'm > convinced that it's useful, but I'm not sure whether it's safe. On the > usefulness side, backtraces are often the only way to troubleshoot > problems that occur on production systems. I wish we had better > logging and tracing tools instead of having to ask for this sort of > thing, but we don't. Agreed. In theory we should be able to do this sort of thing using external trace and diagnostic tools like perf, systemtap, etc. In practice, these tools tend to be quite version-sensitive and hard to get right without multiple rounds of back-and-forth to deal with specifics of the site's setup, installed debuginfo or lack thereof, specific tool versions, etc. It's quite common to have to fall back on attaching gdb with a breakpoint on a function in the export symbol table (so it works w/o debuginfo), saving a core, and then analysing the core on a separate system on which debuginfo is available for all the loaded modules. It's a major pain. The ability to get a basic bt from within Pg is strongly desirable. IIRC gdb's basic unwinder works without external debuginfo, if not especially well. libunwind produces much better results, but that didn't pass the extra-dependency bar when backtracing support was introduced to core postgres. On a side note, to help get better diagnostics I've also been meaning to look into building --enable-debug with -ggdb3 so we can embed macro info, and using dwz to deduplicate+compress the debuginfo so we can encourage people to install it by default on production. I also want to start exporting pointers to all the important data symbols for diagnostic use, even if we do so in a separate ELF section just for debug use.
Re: strange error reporting
Robert Haas writes: >>> Maybe it would be better if it said: >>> connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL: >>> database "rhaas" does not exist >> I'd be inclined to spell it "connection to server at ... failed", >> but that sort of wording is surely also possible. > "connection to server" rather than "connection to database" works for > me; in fact, I think I like it slightly better. If I don't hear any other opinions, I'll change these messages to "connection to server at socket \"%s\" failed: " "connection to server at \"%s\" (%s), port %s failed: " (or maybe "server on socket"? "at" sounds right for the IP address case, but it feels a little off in the socket pathname case.) regards, tom lane
Re: POC: postgres_fdw insert batching
On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra wrote: > On 1/21/21 1:17 AM, Zhihong Yu wrote: > > Hi, > > The assignment to resultRelInfo is done when junk_filter_needed is true: > > > > if (junk_filter_needed) > > { > > resultRelInfo = mtstate->resultRelInfo; > > > > Should the code for determining batch size access mtstate->resultRelInfo > > directly ? > > > > IMO the issue is that code iterates over all plans and moves to the next > for each one: > > resultRelInfo++; > > so it ends up pointing past the last element, hence the failures. So > yeah, either the code needs to move before the loop (per my patch), or > we need to access mtstate->resultRelInfo directly. Accessing mtstate->resultRelInfo directly would do. The only constraint on where this block should be placed is that ri_projectReturning must be valid as of calling GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread. So, after this block in ExecInitModifyTable: /* * Initialize RETURNING projections if needed. */ if (node->returningLists) { /* * Build a projection for each result rel. */ resultRelInfo = mtstate->resultRelInfo; foreach(l, node->returningLists) { List *rlist = (List *) lfirst(l); resultRelInfo->ri_returningList = rlist; resultRelInfo->ri_projectReturning = ExecBuildProjectionInfo(rlist, econtext, slot, >ps, resultRelInfo->ri_RelationDesc->rd_att); resultRelInfo++; } } -- Amit Langote EDB: http://www.enterprisedb.com
Re: POC: postgres_fdw insert batching
Tomas Vondra writes: > I may be wrong, but the most likely explanation seems to be this is due > to the junk filter initialization, which simply moves past the end of > the mtstate->resultRelInfo array. resultRelInfo is certainly pointing at garbage at that point. > It kinda seems the GetForeignModifyBatchSize call should happen before > that block. The attached patch fixes this for me (i.e. regression tests > pass with no valgrind reports. > Or did I get that wrong? Don't we need to initialize ri_BatchSize for *each* resultrelinfo, not merely the first one? That is, this new code needs to be somewhere inside a loop over the result rels. regards, tom lane
Re: POC: postgres_fdw insert batching
Hi, Do we need to consider how this part of code inside ExecInitModifyTable() would evolve ? I think placing the compound condition toward the end of ExecInitModifyTable() is reasonable because it checks the latest information. Regards On Wed, Jan 20, 2021 at 5:11 PM Tomas Vondra wrote: > > > On 1/21/21 2:02 AM, Zhihong Yu wrote: > > Hi, Tomas: > > In my opinion, my patch is a little better. > > Suppose one of the conditions in the if block changes in between the > > start of loop and the end of the loop: > > > > * Determine if the FDW supports batch insert and determine the > batch > > * size (a FDW may support batching, but it may be disabled for the > > * server/table). > > > > My patch would reflect that change. I guess this was the reason the if / > > else block was placed there in the first place. > > > > But can it change? All the loop does is extracting junk attributes from > the plans, it does not modify anything related to the batching. Or maybe > I just don't understand what you mean. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: POC: postgres_fdw insert batching
On 1/21/21 2:02 AM, Zhihong Yu wrote: Hi, Tomas: In my opinion, my patch is a little better. Suppose one of the conditions in the if block changes in between the start of loop and the end of the loop: * Determine if the FDW supports batch insert and determine the batch * size (a FDW may support batching, but it may be disabled for the * server/table). My patch would reflect that change. I guess this was the reason the if / else block was placed there in the first place. But can it change? All the loop does is extracting junk attributes from the plans, it does not modify anything related to the batching. Or maybe I just don't understand what you mean. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: POC: postgres_fdw insert batching
Hi, Tomas: In my opinion, my patch is a little better. Suppose one of the conditions in the if block changes in between the start of loop and the end of the loop: * Determine if the FDW supports batch insert and determine the batch * size (a FDW may support batching, but it may be disabled for the * server/table). My patch would reflect that change. I guess this was the reason the if / else block was placed there in the first place. Cheers On Wed, Jan 20, 2021 at 4:56 PM Tomas Vondra wrote: > > > On 1/21/21 1:17 AM, Zhihong Yu wrote: > > Hi, > > The assignment to resultRelInfo is done when junk_filter_needed is true: > > > > if (junk_filter_needed) > > { > > resultRelInfo = mtstate->resultRelInfo; > > > > Should the code for determining batch size access mtstate->resultRelInfo > > directly ? > > > > IMO the issue is that code iterates over all plans and moves to the next > for each one: > > resultRelInfo++; > > so it ends up pointing past the last element, hence the failures. So > yeah, either the code needs to move before the loop (per my patch), or > we need to access mtstate->resultRelInfo directly. > > I'm pretty amazed this did not crash during any of the many regression > runs I did recently. > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: POC: postgres_fdw insert batching
On 1/21/21 1:17 AM, Zhihong Yu wrote: Hi, The assignment to resultRelInfo is done when junk_filter_needed is true: if (junk_filter_needed) { resultRelInfo = mtstate->resultRelInfo; Should the code for determining batch size access mtstate->resultRelInfo directly ? IMO the issue is that code iterates over all plans and moves to the next for each one: resultRelInfo++; so it ends up pointing past the last element, hence the failures. So yeah, either the code needs to move before the loop (per my patch), or we need to access mtstate->resultRelInfo directly. I'm pretty amazed this did not crash during any of the many regression runs I did recently. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: POC: postgres_fdw insert batching
On 1/21/21 12:59 AM, Tom Lane wrote: Tomas Vondra writes: OK, pushed after a little bit of additional polishing (mostly comments). Thanks everyone! florican reports this is seriously broken on 32-bit hardware: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15 First guess is incorrect memory-allocation computations ... I know, although it seems more like an access to unitialized memory. I've already posted a patch that resolves that for me on 64-bits (per valgrind, I suppose it's the same issue). I'm working on reproducing it on 32-bits, hopefully it won't take long. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Printing LSN made easy
At Wed, 20 Jan 2021 16:40:59 +0900, Michael Paquier wrote in > On Wed, Jan 20, 2021 at 07:25:37AM +0100, Peter Eisentraut wrote: > > It looks like we are not getting any consensus on this approach. One > > reduced version I would consider is just the second part, so you'd write > > something like > > > > snprintf(lsnchar, sizeof(lsnchar), "%X/%X", > > LSN_FORMAT_ARGS(lsn)); > > > > This would still reduce notational complexity quite a bit but avoid any > > funny business with the format strings. > > That seems reasonable to me. So +1. That seems in the good balance. +1, too. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: POC: postgres_fdw insert batching
On 1/21/21 12:52 AM, Tomas Vondra wrote: Hmm, seems that florican doesn't like this :-( https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15 It's a i386 machine running FreeBSD, so not sure what exactly it's picky about. But when I tried running this under valgrind, I get some strange failures in the new chunk in ExecInitModifyTable: /* * Determine if the FDW supports batch insert and determine the batch * size (a FDW may support batching, but it may be disabled for the * server/table). */ if (!resultRelInfo->ri_usesFdwDirectModify && operation == CMD_INSERT && resultRelInfo->ri_FdwRoutine != NULL && resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) resultRelInfo->ri_BatchSize = resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); else resultRelInfo->ri_BatchSize = 1; Assert(resultRelInfo->ri_BatchSize >= 1); It seems as if the resultRelInfo is not initialized, or something like that. I wouldn't be surprised if the 32-bit machine was pickier and failing because of that. A sample of the valgrind log is attached. It's pretty much just repetitions of these three reports. OK, it's definitely accessing uninitialized memory, because the resultRelInfo (on line 2801, i.e. the "if" condition) looks like this: (gdb) p resultRelInfo $1 = (ResultRelInfo *) 0xe595988 (gdb) p *resultRelInfo $2 = {type = 2139062142, ri_RangeTableIndex = 2139062143, ri_RelationDesc = 0x7f7f7f7f7f7f7f7f, ri_NumIndices = 2139062143, ri_IndexRelationDescs = 0x7f7f7f7f7f7f7f7f, ri_IndexRelationInfo = 0x7f7f7f7f7f7f7f7f, ri_TrigDesc = 0x7f7f7f7f7f7f7f7f, ri_TrigFunctions = 0x7f7f7f7f7f7f7f7f, ri_TrigWhenExprs = 0x7f7f7f7f7f7f7f7f, ri_TrigInstrument = 0x7f7f7f7f7f7f7f7f, ri_ReturningSlot = 0x7f7f7f7f7f7f7f7f, ri_TrigOldSlot = 0x7f7f7f7f7f7f7f7f, ri_TrigNewSlot = 0x7f7f7f7f7f7f7f7f, ri_FdwRoutine = 0x7f7f7f7f7f7f7f7f, ri_FdwState = 0x7f7f7f7f7f7f7f7f, ri_usesFdwDirectModify = 127, ri_NumSlots = 2139062143, ri_BatchSize = 2139062143, ri_Slots = 0x7f7f7f7f7f7f7f7f, ri_PlanSlots = 0x7f7f7f7f7f7f7f7f, ri_WithCheckOptions = 0x7f7f7f7f7f7f7f7f, ri_WithCheckOptionExprs = 0x7f7f7f7f7f7f7f7f, ri_ConstraintExprs = 0x7f7f7f7f7f7f7f7f, ri_GeneratedExprs = 0x7f7f7f7f7f7f7f7f, ri_NumGeneratedNeeded = 2139062143, ri_junkFilter = 0x7f7f7f7f7f7f7f7f, ri_returningList = 0x7f7f7f7f7f7f7f7f, ri_projectReturning = 0x7f7f7f7f7f7f7f7f, ri_onConflictArbiterIndexes = 0x7f7f7f7f7f7f7f7f, ri_onConflict = 0x7f7f7f7f7f7f7f7f, ri_PartitionCheckExpr = 0x7f7f7f7f7f7f7f7f, ri_PartitionRoot = 0x7f7f7f7f7f7f7f7f, ri_RootToPartitionMap = 0x8, ri_PartitionTupleSlot = 0x8, ri_ChildToRootMap = 0xe5952b0, ri_CopyMultiInsertBuffer = 0xe596740} (gdb) I may be wrong, but the most likely explanation seems to be this is due to the junk filter initialization, which simply moves past the end of the mtstate->resultRelInfo array. It kinda seems the GetForeignModifyBatchSize call should happen before that block. The attached patch fixes this for me (i.e. regression tests pass with no valgrind reports. Or did I get that wrong? -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 9c36860704..2ac4999dc8 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -2672,6 +2672,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) } } + /* + * Determine if the FDW supports batch insert and determine the batch + * size (a FDW may support batching, but it may be disabled for the + * server/table). + */ + if (!resultRelInfo->ri_usesFdwDirectModify && + operation == CMD_INSERT && + resultRelInfo->ri_FdwRoutine != NULL && + resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && + resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) + resultRelInfo->ri_BatchSize = + resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); + else + resultRelInfo->ri_BatchSize = 1; + + Assert(resultRelInfo->ri_BatchSize >= 1); + /* select first subplan */ mtstate->mt_whichplan = 0; subplan = (Plan *) linitial(node->plans); @@ -2793,23 +2810,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) } } - /* - * Determine if the FDW supports batch insert and determine the batch - * size (a FDW may support batching, but it may be disabled for the - * server/table). - */ - if (!resultRelInfo->ri_usesFdwDirectModify && - operation == CMD_INSERT && - resultRelInfo->ri_FdwRoutine != NULL && - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && - resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) - resultRelInfo->ri_BatchSize = - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
Heap's backwards scan scans the incorrect pages with heap_setscanlimits()
Hackers, It looks like both heapgettup() and heapgettup_pagemode() are coded incorrectly when setting the page to start the scan on for a backwards scan when heap_setscanlimits() has been used. It looks like the code was not updated during 7516f5259. The current code is: /* start from last page of the scan */ if (scan->rs_startblock > 0) page = scan->rs_startblock - 1; else page = scan->rs_nblocks - 1; Where rs_startblock is either the sync scan start location, or the start page set by heap_setscanlimits(). rs_nblocks is the number of blocks in the relation. Let's say we have a 100 block relation and we want to scan blocks 10 to 30 in a backwards order. We'll do heap_setscanlimits(scan, 10, 21); to indicate that we want to scan 21 blocks starting at page 10 and finishing after scanning page 30. What the code above does is wrong. Since rs_startblock is > 0 we'll execute: page = scan->rs_nblocks - 1; i.e. 99. then proceed to scan blocks all blocks down to 78. Oops. Not quite the 10 to 30 that we asked for. Now, it does not appear that there are any live bugs here, in core at least. The only usage I see of heap_setscanlimits() is in heapam_index_build_range_scan() to which I see the scan is a forward scan. I only noticed the bug as I'm in the middle of fixing up [1] to implement backwards TID Range scans. Proposed patch attached. Since this is not a live bug, is it worth a backpatch? I guess some extensions could suffer from this, I'm just not sure how likely that is as if anyone is doing backwards scanning with a setscanlimits set, then they'd surely have noticed this already!? David [1] https://postgr.es/m/CAMyN-kB-nFTkF=va_jpwfno08s0d-yk0f741s2b7ldmyai8...@mail.gmail.com diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index faffbb1865..ddd214b7af 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -603,11 +603,15 @@ heapgettup(HeapScanDesc scan, * forward scanners. */ scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; - /* start from last page of the scan */ - if (scan->rs_startblock > 0) - page = scan->rs_startblock - 1; + + /* +* Start from last page of the scan. Ensure we take into account +* rs_numblocks if it's been adjusted by heap_setscanlimits(). +*/ + if (scan->rs_numblocks != InvalidBlockNumber) + page = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks; else - page = scan->rs_nblocks - 1; + page = (scan->rs_startblock + scan->rs_nblocks - 1) % scan->rs_nblocks; heapgetpage((TableScanDesc) scan, page); } else @@ -918,11 +922,15 @@ heapgettup_pagemode(HeapScanDesc scan, * forward scanners. */ scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC; - /* start from last page of the scan */ - if (scan->rs_startblock > 0) - page = scan->rs_startblock - 1; + + /* +* Start from last page of the scan. Ensure we take into account +* rs_numblocks if it's been adjusted by heap_setscanlimits(). +*/ + if (scan->rs_numblocks != InvalidBlockNumber) + page = (scan->rs_startblock + scan->rs_numblocks - 1) % scan->rs_nblocks; else - page = scan->rs_nblocks - 1; + page = (scan->rs_startblock + scan->rs_nblocks - 1) % scan->rs_nblocks; heapgetpage((TableScanDesc) scan, page); } else
Re: POC: postgres_fdw insert batching
Hi, The assignment to resultRelInfo is done when junk_filter_needed is true: if (junk_filter_needed) { resultRelInfo = mtstate->resultRelInfo; Should the code for determining batch size access mtstate->resultRelInfo directly ? diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 9c36860704..a6a814454d 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -2798,17 +2798,17 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) * size (a FDW may support batching, but it may be disabled for the * server/table). */ -if (!resultRelInfo->ri_usesFdwDirectModify && +if (!mtstate->resultRelInfo->ri_usesFdwDirectModify && operation == CMD_INSERT && -resultRelInfo->ri_FdwRoutine != NULL && -resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && -resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) -resultRelInfo->ri_BatchSize = - resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); +mtstate->resultRelInfo->ri_FdwRoutine != NULL && +mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && +mtstate->resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) +mtstate->resultRelInfo->ri_BatchSize = + mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(mtstate->resultRelInfo); else -resultRelInfo->ri_BatchSize = 1; +mtstate->resultRelInfo->ri_BatchSize = 1; -Assert(resultRelInfo->ri_BatchSize >= 1); +Assert(mtstate->resultRelInfo->ri_BatchSize >= 1); /* * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it Cheers On Wed, Jan 20, 2021 at 3:52 PM Tomas Vondra wrote: > Hmm, seems that florican doesn't like this :-( > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15 > > It's a i386 machine running FreeBSD, so not sure what exactly it's picky > about. But when I tried running this under valgrind, I get some strange > failures in the new chunk in ExecInitModifyTable: > >/* > * Determine if the FDW supports batch insert and determine the batch > * size (a FDW may support batching, but it may be disabled for the > * server/table). > */ >if (!resultRelInfo->ri_usesFdwDirectModify && >operation == CMD_INSERT && >resultRelInfo->ri_FdwRoutine != NULL && >resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && >resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) >resultRelInfo->ri_BatchSize = > > resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); >else >resultRelInfo->ri_BatchSize = 1; > >Assert(resultRelInfo->ri_BatchSize >= 1); > > It seems as if the resultRelInfo is not initialized, or something like > that. I wouldn't be surprised if the 32-bit machine was pickier and > failing because of that. > > A sample of the valgrind log is attached. It's pretty much just > repetitions of these three reports. > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: POC: postgres_fdw insert batching
Tomas Vondra writes: > OK, pushed after a little bit of additional polishing (mostly comments). > Thanks everyone! florican reports this is seriously broken on 32-bit hardware: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15 First guess is incorrect memory-allocation computations ... regards, tom lane
Re: list of extended statistics on psql
Hi Tomas and hackers, On 2021/01/21 7:00, Tomas Vondra wrote: I created patches and my test results on PG10, 11, 12, and 14 are fine. 0001: - Fix query to use pg_statistic_ext only - Replace statuses "required" and "built" with "defined" - Remove the size columns - Fix document - Add schema name as a filter condition on the query 0002: - Fix all results of \dX - Add new testcase by non-superuser Please find attached files. :-D Thanks, I've pushed this. I had to tweak the regression tests a bit, for two reasons: 1) to change user in regression tests, don't use \connect, but SET ROLE and RESET ROLE 2) roles in regression tests should use names with regress_ prefix Thanks for reviewing many times and committing the feature! I understood 1) and 2). I'll keep that in mind for the next developing patch. Then, If possible, could you add Justin to the commit message as a reviewer? Because I revised the document partly based on his comments. Finally, As extended stats were more used, this feature becomes more useful. I hope it helps DBA. :-D Thanks, Tatsuro Yamada
Re: POC: postgres_fdw insert batching
Hmm, seems that florican doesn't like this :-( https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15 It's a i386 machine running FreeBSD, so not sure what exactly it's picky about. But when I tried running this under valgrind, I get some strange failures in the new chunk in ExecInitModifyTable: /* * Determine if the FDW supports batch insert and determine the batch * size (a FDW may support batching, but it may be disabled for the * server/table). */ if (!resultRelInfo->ri_usesFdwDirectModify && operation == CMD_INSERT && resultRelInfo->ri_FdwRoutine != NULL && resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize && resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert) resultRelInfo->ri_BatchSize = resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo); else resultRelInfo->ri_BatchSize = 1; Assert(resultRelInfo->ri_BatchSize >= 1); It seems as if the resultRelInfo is not initialized, or something like that. I wouldn't be surprised if the 32-bit machine was pickier and failing because of that. A sample of the valgrind log is attached. It's pretty much just repetitions of these three reports. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company ==186005== Invalid read of size 1 ==186005==at 0x759C12: ExecInitModifyTable (nodeModifyTable.c:2801) ==186005==by 0x720B35: ExecInitNode (execProcnode.c:174) ==186005==by 0x716F99: InitPlan (execMain.c:939) ==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266) ==186005==by 0x715CD3: ExecutorStart (execMain.c:148) ==186005==by 0x9495B1: ProcessQuery (pquery.c:155) ==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267) ==186005==by 0x94A4C8: PortalRun (pquery.c:779) ==186005==by 0x94438E: exec_simple_query (postgres.c:1240) ==186005==by 0x9485F5: PostgresMain (postgres.c:4394) ==186005==by 0x88D1D4: BackendRun (postmaster.c:4484) ==186005==by 0x88CB53: BackendStartup (postmaster.c:4206) ==186005==by 0x889205: ServerLoop (postmaster.c:1730) ==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402) ==186005==by 0x78D466: main (main.c:209) ==186005== Address 0xe594f78 is 1,864 bytes inside a block of size 8,192 alloc'd ==186005==at 0x483A809: malloc (vg_replace_malloc.c:307) ==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468) ==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252) ==186005==by 0x7299DD: CreateExprContext (execUtils.c:302) ==186005==by 0x729C5E: ExecAssignExprContext (execUtils.c:481) ==186005==by 0x75D24D: ExecInitSeqScan (nodeSeqscan.c:147) ==186005==by 0x720BEF: ExecInitNode (execProcnode.c:207) ==186005==by 0x716F99: InitPlan (execMain.c:939) ==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266) ==186005==by 0x715CD3: ExecutorStart (execMain.c:148) ==186005==by 0x949E64: PortalStart (pquery.c:505) ==186005==by 0x9442A2: exec_simple_query (postgres.c:1201) ==186005==by 0x9485F5: PostgresMain (postgres.c:4394) ==186005==by 0x88D1D4: BackendRun (postmaster.c:4484) ==186005==by 0x88CB53: BackendStartup (postmaster.c:4206) ==186005==by 0x889205: ServerLoop (postmaster.c:1730) ==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402) ==186005==by 0x78D466: main (main.c:209) ==186005== { Memcheck:Addr1 fun:ExecInitModifyTable fun:ExecInitNode fun:InitPlan fun:standard_ExecutorStart fun:ExecutorStart fun:ProcessQuery fun:PortalRunMulti fun:PortalRun fun:exec_simple_query fun:PostgresMain fun:BackendRun fun:BackendStartup fun:ServerLoop fun:PostmasterMain fun:main } ==186005== Invalid write of size 4 ==186005==at 0x759C74: ExecInitModifyTable (nodeModifyTable.c:2809) ==186005==by 0x720B35: ExecInitNode (execProcnode.c:174) ==186005==by 0x716F99: InitPlan (execMain.c:939) ==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266) ==186005==by 0x715CD3: ExecutorStart (execMain.c:148) ==186005==by 0x9495B1: ProcessQuery (pquery.c:155) ==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267) ==186005==by 0x94A4C8: PortalRun (pquery.c:779) ==186005==by 0x94438E: exec_simple_query (postgres.c:1240) ==186005==by 0x9485F5: PostgresMain (postgres.c:4394) ==186005==by 0x88D1D4: BackendRun (postmaster.c:4484) ==186005==by 0x88CB53: BackendStartup (postmaster.c:4206) ==186005==by 0x889205: ServerLoop (postmaster.c:1730) ==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402) ==186005==by 0x78D466: main (main.c:209) ==186005== Address 0xe594f80 is 1,872 bytes inside a block of size 8,192 alloc'd ==186005==at 0x483A809: malloc (vg_replace_malloc.c:307) ==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468) ==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252) ==186005==by
Re: [PATCH 1/1] Fix detection of pwritev support for OSX.
On Wed, Jan 20, 2021 at 4:07 PM Tom Lane wrote: > > James Hilliard writes: > > On Tue, Jan 19, 2021 at 6:37 PM Tom Lane wrote: > >> I've found no direct means to control the > >> SDK path at all, but so far it appears that "xcrun --show-sdk-path" > >> agrees with the compiler's default -isysroot path as seen in the > >> compiler's -v output. I suspect that this isn't coincidental, > >> but reflects xcrun actually being used in the compiler launch > >> process. If it were to flip over to using a IOS SDK, that would > >> mean that bare "cc" would generate nonfunctional executables, > >> which just about any onlooker would agree is broken. > > > So there's some more weirdness involved here, whether or not you > > have the command line install seems to affect the output of the > > "xcrun --show-sdk-path" command, but not the > > "xcrun --sdk macosx --show-sdk-path" command. > > Yeah, that's what we discovered in the other thread. It seems that > with "--sdk macosx" you'll always get a pointer to the (solitary) > SDK under /Applications/Xcode.app, but with the short "xcrun > --show-sdk-path" command you might get either that or a pointer to > something under /Library/Developer/CommandLineTools. > > I now believe what is actually happening with the short command is > that it's iterating through the available SDKs (according to some not > very clear search path) and picking the first one it finds that > matches the host system version. That matches the ktrace evidence > that shows it reading the SDKSettings.plist file in each SDK > directory. The fact that it can seize on either an actual directory > or an equivalent symlink might be due to chance ordering of directory > entries. (It'd be interesting to see "ls -f" output for your > /Library/Developer/CommandLineTools/SDKs directory ... though if Well at the moment I completely deleted that directory...and the build works fine with my patch still. > you've been experimenting with deinstall/reinstall, there's no > reason to suppose the entry order is still the same.) > > I'm not sure that the case of not having the "command line tools" > installed is interesting for our purposes. AFAIK you have to have > that in order to have access to required tools like bison and gmake. > (That reminds me, I was intending to add something to our docs > about how-to-build-from-source to say that you need to install those.) Yeah, not 100% sure but I was able to build just fine after deleting my command line tools. I think it just switched to using the normal SDK toolchain, I guess that's the fallback logic doing that. It would be pretty annoying to have to install an outdated SDK just to build postgres for no other reason than the autoconf feature detection being broken. > > > Note that with my patch the binaries will always be compatible with the > > host system by default, even if the SDK is capable of producing binaries > > that are incompatible so building postgres works with and without the > > command line tools SDK. > > Yeah. I don't see that as a benefit actually. Adding the > -no_weak_imports linker switch (or the other one you're suggesting) > means that you *cannot* cross-compile for a newer macOS version, > even if you set PG_SYSROOT and/or MACOSX_DEPLOYMENT_TARGET with the > intention of doing so. Best I can tell this isn't true, I was able to cross compile for a newer MACOSX_DEPLOYMENT_TARGET than my build host just fine. The binary fails with a "Symbol not found: _pwritev" error when I try to run it on the system that built it. In regards to the -no_weak_imports switch...that is something different from my understanding as it just strips the weak imports forcing the fallback code paths to be taken instead, essentially functioning as if the weak symbols are never available. It's largely separate from the deployment target from my understanding as weak symbols are feature that lets you use newer syscalls while still providing backwards compatible fallbacks for older systems. > You'll still get a build that reflects the set > of kernel calls available on the host system. Admittedly, this is a > case that's not likely to be of interest to very many people, but > I don't see why a method with that restriction is superior to picking > a default SDK that matches the host system (and can be overridden). But to fix the build when using a newer SDK overriding the SDK location does not help, you would have to override the broken feature detection. > > > So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable > > but either should work as long as we can properly detect deployment > > target symbol availability, regardless this SDK sysroot selection issue is > > effectively an entirely different issue from the feature detection not > > properly > > respecting the configured deployment target. > > No, I think it's pretty much equivalent. If we pick the right SDK > then we'll get the build we want. Generally any recent SDK installed should
Re: Allow matching whole DN from a client certificate
On Wed, 2021-01-20 at 19:07 +, Jacob Champion wrote: > I think you'll want to be careful to specify the format as much as > possible, both to make sure that other backend TLS implementations can > actually use the same escaping system and to ensure that user regexes > don't suddenly start matching different things at some point in the > future. Along those lines: the current implementation doesn't escape commas in fields, which means you can inject them to force a bad regex match. For instance, when using the usermap that's in the patch: dn"/^.*OU=Testing,.*$"username if I create a certificate with the Organizational Unit name "Testing, or something", then that will also match. Switching to RFC 2253/4514 quoting fixes comma injection (and reverses the order of the RDNs, which requires a change to the regex patterns). But I think that the regex as supplied still isn't strong enough to prevent problems. For example, the equals sign isn't a delimiter and therefore isn't quoted. So if I get my CA to sign a certificate with some arbitrary field value of "HEY YOU=Testing", then that will also match the above usermap. You'd need to write the regex with extreme attention to detail and a full understanding of the escaping scheme to get around that -- assuming that the scheme is generally parsable with regexes to begin with. > I'm going to test this patch with some UTF-8 DNs later today; I'll share my > findings. UTF-8 has the opposite issue; it's escaped in a way that makes it unusable in a regex match. For example, say I have a (simple for the sake of example, but broken as noted above) usermap of dn"/^CN=([^,]*).*$"\1 which is supposed to emulate the functionality of the "clientname=CN" mode, and two users named "postgres" and "οδυσσέας". The "CN=postgres" user will work just fine, but the UTF-8 CN of "οδυσσέας" will be escaped into "\U03BF\U03B4\U03C5\U03C3\U03C3\U03AD\U03B1\U03C2" and fail to match the internal user. (I'm not seeing an RFC describe the "\U" escaping scheme; maybe it's OpenSSL-specific?) --Jacob
Re: [PATCH 1/1] Fix detection of pwritev support for OSX.
James Hilliard writes: > On Tue, Jan 19, 2021 at 6:37 PM Tom Lane wrote: >> I've found no direct means to control the >> SDK path at all, but so far it appears that "xcrun --show-sdk-path" >> agrees with the compiler's default -isysroot path as seen in the >> compiler's -v output. I suspect that this isn't coincidental, >> but reflects xcrun actually being used in the compiler launch >> process. If it were to flip over to using a IOS SDK, that would >> mean that bare "cc" would generate nonfunctional executables, >> which just about any onlooker would agree is broken. > So there's some more weirdness involved here, whether or not you > have the command line install seems to affect the output of the > "xcrun --show-sdk-path" command, but not the > "xcrun --sdk macosx --show-sdk-path" command. Yeah, that's what we discovered in the other thread. It seems that with "--sdk macosx" you'll always get a pointer to the (solitary) SDK under /Applications/Xcode.app, but with the short "xcrun --show-sdk-path" command you might get either that or a pointer to something under /Library/Developer/CommandLineTools. I now believe what is actually happening with the short command is that it's iterating through the available SDKs (according to some not very clear search path) and picking the first one it finds that matches the host system version. That matches the ktrace evidence that shows it reading the SDKSettings.plist file in each SDK directory. The fact that it can seize on either an actual directory or an equivalent symlink might be due to chance ordering of directory entries. (It'd be interesting to see "ls -f" output for your /Library/Developer/CommandLineTools/SDKs directory ... though if you've been experimenting with deinstall/reinstall, there's no reason to suppose the entry order is still the same.) I'm not sure that the case of not having the "command line tools" installed is interesting for our purposes. AFAIK you have to have that in order to have access to required tools like bison and gmake. (That reminds me, I was intending to add something to our docs about how-to-build-from-source to say that you need to install those.) > Note that with my patch the binaries will always be compatible with the > host system by default, even if the SDK is capable of producing binaries > that are incompatible so building postgres works with and without the > command line tools SDK. Yeah. I don't see that as a benefit actually. Adding the -no_weak_imports linker switch (or the other one you're suggesting) means that you *cannot* cross-compile for a newer macOS version, even if you set PG_SYSROOT and/or MACOSX_DEPLOYMENT_TARGET with the intention of doing so. You'll still get a build that reflects the set of kernel calls available on the host system. Admittedly, this is a case that's not likely to be of interest to very many people, but I don't see why a method with that restriction is superior to picking a default SDK that matches the host system (and can be overridden). > So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable > but either should work as long as we can properly detect deployment > target symbol availability, regardless this SDK sysroot selection issue is > effectively an entirely different issue from the feature detection not > properly > respecting the configured deployment target. No, I think it's pretty much equivalent. If we pick the right SDK then we'll get the build we want. regards, tom lane
Re: POC: postgres_fdw insert batching
OK, pushed after a little bit of additional polishing (mostly comments). Thanks everyone! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ResourceOwner refactoring
On 19/01/2021 11:09, Heikki Linnakangas wrote: On 18/01/2021 18:11, Robert Haas wrote: On Mon, Jan 18, 2021 at 11:11 AM Robert Haas wrote: On Mon, Jan 18, 2021 at 10:19 AM Heikki Linnakangas wrote: On 18/01/2021 16:34, Alvaro Herrera wrote: So according to your performance benchmark, we're willing to accept a 30% performance loss on an allegedly common operation -- numkeep=0 numsnaps=10 becomes 49.8ns from 37.6ns. That seems a bit shocking. Maybe you can claim that these operations aren't exactly hot spots, and so the fact that we remain in the same power-of-ten is sufficient. Is that the argument? That's right. The fast path is fast, and that's important. The slow path becomes 30% slower, but that's acceptable. I don't know whether a 30% slowdown will hurt anybody, but it seems like kind of a lot, and I'm not sure I understand what corresponding benefit we're getting. The benefit is to make it easy for extensions to use resource owners to track whatever resources they need to track. And arguably, the new mechanism is nicer for built-in code, too. I'll see if I can optimize the slow paths, to make it more palatable. Ok, here's a new set of patches, and new test results. I replaced the hash function with a cheaper one. I also added the missing wrappers that Alvaro and Hayato Kuroda commented on, and fixed the typos that Michael Paquier pointed out. In the test script, I increased the number of iterations used in the tests, to make them run longer and produce more stable results. There is still a fair amount of jitter in the results, so take any particular number with a grain of salt, but the overall trend is repeatable. The results now look like this: Unpatched - postgres=# \i snaptest.sql numkeep | numsnaps | lifo_time_ns | fifo_time_ns -+--+--+-- 0 |1 | 32.7 | 32.3 0 |5 | 33.0 | 32.9 0 | 10 | 34.1 | 35.5 0 | 60 | 32.5 | 38.3 0 | 70 | 47.6 | 48.9 0 | 100 | 47.9 | 49.3 0 | 1000 | 52.9 | 52.7 0 |1 | 61.7 | 62.4 9 | 10 | 38.4 | 37.6 9 | 100 | 42.3 | 42.3 9 | 1000 | 43.9 | 45.0 9 |1 | 62.2 | 62.5 65 | 70 | 42.4 | 42.9 65 | 100 | 43.2 | 43.9 65 | 1000 | 44.0 | 45.1 65 |1 | 62.4 | 62.6 (16 rows) Patched --- postgres=# \i snaptest.sql numkeep | numsnaps | lifo_time_ns | fifo_time_ns -+--+--+-- 0 |1 | 32.8 | 34.2 0 |5 | 33.8 | 32.2 0 | 10 | 37.2 | 40.2 0 | 60 | 40.2 | 45.1 0 | 70 | 40.9 | 48.4 0 | 100 | 41.9 | 45.2 0 | 1000 | 49.0 | 55.6 0 |1 | 62.4 | 67.2 9 | 10 | 33.6 | 33.0 9 | 100 | 39.6 | 39.7 9 | 1000 | 42.2 | 45.0 9 |1 | 60.7 | 63.9 65 | 70 | 32.5 | 33.6 65 | 100 | 37.5 | 39.7 65 | 1000 | 42.3 | 46.3 65 |1 | 61.9 | 64.8 (16 rows) For easier side-by-side comparison, here are the same numbers with the patched and unpatched results side by side, and their ratio (ratio < 1 means the patched version is faster): LIFO tests: numkeep | numsnaps | unpatched | patched | ratio -+--+---+-+--- 0 |1 | 32.7 |32.8 | 1.00 0 |5 | 33.0 |33.8 | 1.02 0 | 10 | 34.1 |37.2 | 1.09 0 | 60 | 32.5 |40.2 | 1.24 0 | 70 | 47.6 |40.9 | 0.86 0 | 100 | 47.9 |41.9 | 0.87 0 | 1000 | 52.9 |49.0 | 0.93 0 |1 | 61.7 |62.4 | 1.01 9 | 10 | 38.4 |33.6 | 0.88 9 | 100 | 42.3 |39.6 | 0.94 9 | 1000 | 43.9 |42.2 | 0.96 9 |1 | 62.2 |60.7 | 0.98 65 | 70 | 42.4 |32.5 | 0.77 65 | 100 | 43.2 |37.5 | 0.87 65 | 1000 | 44.0 |42.3 | 0.96 65 |1 | 62.4 |61.9 | 0.99 (16 rows) FIFO tests: numkeep | numsnaps | unpatched | patched | ratio -+--+---+-+--- 0 |1 | 32.3 |34.2 | 1.06 0 |5 | 32.9 |32.2 | 0.98 0 | 10 | 35.5 |40.2 | 1.13
Re: pg_upgrade fails with non-standard ACL
On 03.01.2021 14:29, Noah Misch wrote: On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote: On 08.06.2020 19:31, Alvaro Herrera wrote: I'm thinking what's a good way to have a test that's committable. Maybe it would work to add a TAP test to pg_upgrade that runs initdb, does a few GRANTS as per your attachment, then runs pg_upgrade? Currently the pg_upgrade tests are not TAP ... we'd have to revive https://postgr.es/m/20180126080026.gi17...@paquier.xyz (Some progress was already made on the buildfarm front to permit this) I would be glad to add some test, but it seems to me that the infrastructure changes for cross-version pg_upgrade test is much more complicated task than this modest bugfix. Agreed. Thank you for the review. New version of the patch is attached, though I haven't tested it properly yet. I am planning to do in a couple of days. --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c +static void +check_for_changed_signatures(void) +{ + snprintf(output_path, sizeof(output_path), "revoke_objects.sql"); If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database in which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened requires a GRANT. Can you use a file name that will still make sense if someone enhances pg_upgrade to generate such GRANT statements? I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to you? + if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL) + pg_fatal("could not open file \"%s\": %s\n", +output_path, strerror(errno)); Use %m instead of passing sterror(errno) to %s. Done. + } + + /* Handle columns separately */ + if (strstr(aclinfo->obj_type, "column") != NULL) + { + char *pdot = last_dot_location(aclinfo->obj_ident); I'd write strrchr(aclinfo->obj_ident, '.') instead of introducing last_dot_location(). This assumes column names don't contain '.'; how difficult would it be to remove that assumption? If it's difficult, a cheap approach would be to ignore any entry containing too many dots. We're not likely to create such column names at initdb time, but v13 does allow: ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b"; GRANT SELECT ("a.b") ON pg_locks TO backup; After which revoke_objects.sql has: psql:./revoke_objects.sql:9: ERROR: syntax error at or near "") ON pg_catalog.pg_locks."" LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup"; While that ALTER VIEW probably should have required allow_system_table_mods, we need to assume such databases exist. I've fixed it by using pg_identify_object_as_address(). Now we don't have to parse obj_identity. Overall, this patch predicts a subset of cases where pg_dump will emit a failing GRANT or REVOKE that targets a pg_catalog object. Can you write a code comment stating what it does and does not detect? I think it's okay to not predict every failure, but let's record the intended coverage. Here are a few examples I know; there may be more. The above query only finds GRANTs to non-pinned roles. pg_dump dumps the following, but the above query doesn't find them: REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public; GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend; The above query should test refclassid. This function should not run queries against servers older than 9.6, because pg_dump emits GRANT/REVOKE for pg_catalog objects only when targeting 9.6 or later. An upgrade from 8.4 to master is failing on the above query: Fixed. The patch adds many lines wider than 78 columns, this being one example. In general, break such lines. (Don't break string literal arguments of ereport(), though.) Fixed. nm -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 43fc297eb6..429e4468f2 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -16,6 +16,7 @@ static void check_new_cluster_is_empty(void); static void check_databases_are_compatible(void); +static void check_for_changed_signatures(void); static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb); static bool equivalent_locale(int category, const char *loca, const char *locb); static void check_is_install_user(ClusterInfo *cluster); @@ -151,6 +152,8 @@ check_and_dump_old_cluster(bool live_check) if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804) new_9_0_populate_pg_largeobject_metadata(_cluster, true); + get_non_default_acl_infos(_cluster); + /* * While not a check option, we do this now because this is the
Re: list of extended statistics on psql
On 1/20/21 7:41 AM, Tatsuro Yamada wrote: Hi Tomas, On 2021/01/20 11:35, Tatsuro Yamada wrote: Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-( Don't worry about it. We didn't notice the problem even when viewed by multiple people on -hackers. Let's keep moving forward. :-D I'll send a patch including a regression test on the next patch. I created patches and my test results on PG10, 11, 12, and 14 are fine. 0001: - Fix query to use pg_statistic_ext only - Replace statuses "required" and "built" with "defined" - Remove the size columns - Fix document - Add schema name as a filter condition on the query 0002: - Fix all results of \dX - Add new testcase by non-superuser Please find attached files. :-D Thanks, I've pushed this. I had to tweak the regression tests a bit, for two reasons: 1) to change user in regression tests, don't use \connect, but SET ROLE and RESET ROLE 2) roles in regression tests should use names with regress_ prefix regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH 1/1] Fix detection of pwritev support for OSX.
On Tue, Jan 19, 2021 at 6:37 PM Tom Lane wrote: > > James Hilliard writes: > > Actually, this looks path looks wrong in general, the value for > > "xcrun --sdk macosx --show-sdk-path" should take precedence over > > "xcrun --show-sdk-path" as the latter may be used for IOS potentially. > > What is "potentially"? Well I'm not sure the SDK parameter always defaults to macos although I guess it probably does as I couldn't figure out a way to change it: $ xcodebuild -showsdks iOS SDKs: iOS 14.3 -sdk iphoneos14.3 iOS Simulator SDKs: Simulator - iOS 14.3 -sdk iphonesimulator14.3 macOS SDKs: DriverKit 20.2-sdk driverkit.macosx20.2 macOS 11.1-sdk macosx11.1 tvOS SDKs: tvOS 14.3 -sdk appletvos14.3 tvOS Simulator SDKs: Simulator - tvOS 14.3 -sdk appletvsimulator14.3 watchOS SDKs: watchOS 7.2 -sdk watchos7.2 watchOS Simulator SDKs: Simulator - watchOS 7.2 -sdk watchsimulator7.2 > I've found no direct means to control the > SDK path at all, but so far it appears that "xcrun --show-sdk-path" > agrees with the compiler's default -isysroot path as seen in the > compiler's -v output. I suspect that this isn't coincidental, > but reflects xcrun actually being used in the compiler launch > process. If it were to flip over to using a IOS SDK, that would > mean that bare "cc" would generate nonfunctional executables, > which just about any onlooker would agree is broken. So there's some more weirdness involved here, whether or not you have the command line install seems to affect the output of the "xcrun --show-sdk-path" command, but not the "xcrun --sdk macosx --show-sdk-path" command. This is what I get without the command line tools: $ xcrun --show-sdk-path /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk $ xcrun --sdk macosx --show-sdk-path /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk this last one is just a symlink to the other path. With command line tools this is different however: $ xcrun --show-sdk-path /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk $ xcrun --sdk macosx --show-sdk-path /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk Note that the /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk is different from the normal SDK and doesn't seem to be able to generate binaries that target a 11.0 deployment target on my 10.15 system, however I am unsure if this behavior can be relied upon. So in terms of what works best, the newer normal SDK has the most flexibility as it can produce both 10.15 target binaries and 11.0 target binaries depending on the MACOSX_DEPLOYMENT_TARGET while the command line tools SDK can only produce 10.15 target binaries it would appear. Note that with my patch the binaries will always be compatible with the host system by default, even if the SDK is capable of producing binaries that are incompatible so building postgres works with and without the command line tools SDK. So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable but either should work as long as we can properly detect deployment target symbol availability, regardless this SDK sysroot selection issue is effectively an entirely different issue from the feature detection not properly respecting the configured deployment target. > > I'm really not excited about trying to make the build work with > a non-native SDK as you are proposing. I think that's just going > to lead to a continuing stream of problems, because of Apple's > opinions about how cross-version compatibility should work. Well the minimum required target version is pretty much strictly based on MACOSX_DEPLOYMENT_TARGET so our feature detection still needs to use that, otherwise cross target compilation for newer or older targets will not work correctly. >From my understanding the reason AC_REPLACE_FUNCS does not throw an error for deployment target incompatible functions is that it only checks if the function exists and not if it is actually useable, this is why I had to add an explicit AC_LANG_PROGRAM compile test to properly trigger a compile failure if the function is not usable for a particular deployment target version, merely checking if the function exists in the header is not sufficient. > It also seems like unnecessary complexity, because there is always > (AFAICS) a native SDK version available. We just need to find it. Best I can tell this is not true, it is some(most?) of the time but it's not something we can rely upon as systems may only contain a newer SDK, but this newer SDK is still capable of producing binaries that can run on the build host system so this shouldn't be an issue as long as we can do target feature detection properly. > > regards, tom lane
Re: Deleting older versions in unique indexes to avoid page splits
On Wed, Jan 20, 2021 at 10:53 AM Peter Geoghegan wrote: > This patch is unusual in that you really need to think about emergent > behaviors to understand it. That is certainly a difficult thing to do, > and it's understandable that even an expert might not grok it without > considering it carefully. I happened to stumble upon a recent blog post that seems like a light, approachable introduction to some of the key concepts here: https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/ Bottom-up index deletion enhances a complex system whose maintenance costs are *dramatically* nonlinear, at least in many important cases. If you apply linear thinking to such a system then you'll probably end up with a bad design. The system as a whole is made efficient by making sure that we're lazy when that makes sense, while also making sure that we're eager when that makes sense. So it almost *has* to be structured as a bottom-up, reactive mechanism -- no other approach is able to ramp up or down in exactly the right way. Talking about small cost differences (things that can easily be empirically measured, perhaps with a microbenchmark) is almost irrelevant to the big picture. It's even irrelevant to the "medium picture". What's more, it's basically a mistake to think of heap page accesses that don't yield any deletable index tuples as wasted effort (even though that's how I describe them myself!). Here's why: we have to access the heap page to learn that it has nothing for us in the first place place! If we somehow knew ahead of time that some useless-to-us heap block was useless, then the whole system wouldn't be bottom-up (by definition). In other words, failing to get any index tuple deletes from an entire heap page *is itself a form of feedback* at the local level -- it guides the entire system's behavior over time. Why should we expect to get that information at zero cost? This is somehow both simple and complicated, which creates huge potential for miscommunications. I tried to describe this in various ways at various points. Perhaps I could have done better with that. -- Peter Geoghegan
Re: poc - possibility to write window function in PL languages
st 20. 1. 2021 v 21:32 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > st 20. 1. 2021 v 21:07 odesílatel Tom Lane napsal: > >> Uh, what? I don't understand what this "partition context" is. > > > It was my name for an access to window partition local memory - > > WinGetPartitionLocalMemory > > Ah. > > > We need some interface for this cache > > I'm not convinced we need to expose that, or that it'd be very > satisfactory to plpgsql users if we did. The fact that it's fixed-size > and initializes to zeroes are both things that are okay for C programmers > but might be awkward to deal with in plpgsql code. At the very least it > would greatly constrain what data types you could usefully store. > > So I'd be inclined to leave that out, at least for the first version. > I think this functionality is relatively important. If somebody tries to implement own window function, then he starts with some variation of the row_num function. We can support only types of fixed length to begin. Regards Pavel > > regards, tom lane >
Re: poc - possibility to write window function in PL languages
Pavel Stehule writes: > st 20. 1. 2021 v 21:07 odesílatel Tom Lane napsal: >> Uh, what? I don't understand what this "partition context" is. > It was my name for an access to window partition local memory - > WinGetPartitionLocalMemory Ah. > We need some interface for this cache I'm not convinced we need to expose that, or that it'd be very satisfactory to plpgsql users if we did. The fact that it's fixed-size and initializes to zeroes are both things that are okay for C programmers but might be awkward to deal with in plpgsql code. At the very least it would greatly constrain what data types you could usefully store. So I'd be inclined to leave that out, at least for the first version. regards, tom lane
Re: poc - possibility to write window function in PL languages
st 20. 1. 2021 v 21:07 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > The second question is work with partition context value. This should be > > only one value, and of only one but of any type per function. In this > case > > we cannot use GET statements. I had an idea of enhancing declaration. > Some > > like > > > DECLARE > > pcx PARTITION CONTEXT (int); -- read partition context > > BEGIN > > pcx := 10; -- set partition context > > > What do you think about it? > > Uh, what? I don't understand what this "partition context" is. > It was my name for an access to window partition local memory - WinGetPartitionLocalMemory We need some interface for this cache Regards Pavel > > regards, tom lane >
Re: poc - possibility to write window function in PL languages
Pavel Stehule writes: > The second question is work with partition context value. This should be > only one value, and of only one but of any type per function. In this case > we cannot use GET statements. I had an idea of enhancing declaration. Some > like > DECLARE > pcx PARTITION CONTEXT (int); -- read partition context > BEGIN > pcx := 10; -- set partition context > What do you think about it? Uh, what? I don't understand what this "partition context" is. regards, tom lane
Re: strange error reporting
On Wed, Jan 20, 2021 at 1:54 PM Tom Lane wrote: > Alvaro Herrera writes: > > On 2021-Jan-20, Robert Haas wrote: > >> I figured it was something like that. I don't know whether the right > >> thing is to use something like PQdb() to get the correct database > >> name, or whether we should go with Tom's suggestion and omit that > >> detail altogether, but I think showing the empty string when the user > >> relied on the default is too confusing. > > > Well, the patch seems small enough, and I don't think it'll be in any > > way helpful to omit that detail. > > I'm +1 for applying and back-patching that. I still think we might > want to just drop the phrase altogether in HEAD, but we wouldn't do > that in the back branches, and the message is surely misleading as-is. Sure, that makes sense. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Calculation of relids (pull_varnos result) for PlaceHolderVars
I wrote: > ... > 2. pull_varnos() is not passed the planner "root" data structure, > so it can't get at the PlaceHolderInfo list. We can change its > API of course, but that propagates to dozens of places. > ... > The 0001 patch attached goes ahead and makes those API changes. > I think this is perfectly reasonable to do in HEAD, but it most > likely is an unacceptable API/ABI break for the back branches. > ... > A third way is to preserve the existing pull_varnos() API in > the back branches, changing all the internal calls to use a > new function that has the additional "root" parameter. This > seems feasible but I've not attempted to code it yet. Here's a proposed fix that does it like that. The 0001 patch is the same as before, and then 0002 is a delta to be applied only in the back branches. What I did there was install a layer of macros in the relevant .c files that cause calls that look like the HEAD versions to be redirected to the "xxx_new" functions. The idea is to keep the actual code in sync with HEAD, for readability and to minimize back-patching pain. It could be argued that this is too cute and the internal references should just go to the "new" functions in the back branches. I did not bother to preserve ABI for these two functions: indexcol_is_bool_constant_for_query() build_implied_join_equality() because I judged it highly unlikely that any extensions are calling them. If anybody thinks differently, we could hack those in the same way. regards, tom lane diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 2f2d4d171c..48c2f23ae0 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -5709,7 +5709,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel, * RestrictInfos, so we must make our own. */ Assert(!IsA(expr, RestrictInfo)); - rinfo = make_restrictinfo(expr, + rinfo = make_restrictinfo(root, + expr, true, false, false, diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index 4f5b870d1b..d263ecf082 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -227,7 +227,7 @@ clauselist_selectivity_ext(PlannerInfo *root, } else { -ok = (NumRelids(clause) == 1) && +ok = (NumRelids(root, clause) == 1) && (is_pseudo_constant_clause(lsecond(expr->args)) || (varonleft = false, is_pseudo_constant_clause(linitial(expr->args; @@ -609,7 +609,7 @@ bms_is_subset_singleton(const Bitmapset *s, int x) * restriction or join estimator. Subroutine for clause_selectivity(). */ static inline bool -treat_as_join_clause(Node *clause, RestrictInfo *rinfo, +treat_as_join_clause(PlannerInfo *root, Node *clause, RestrictInfo *rinfo, int varRelid, SpecialJoinInfo *sjinfo) { if (varRelid != 0) @@ -643,7 +643,7 @@ treat_as_join_clause(Node *clause, RestrictInfo *rinfo, if (rinfo) return (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE); else - return (NumRelids(clause) > 1); + return (NumRelids(root, clause) > 1); } } @@ -860,7 +860,7 @@ clause_selectivity_ext(PlannerInfo *root, OpExpr *opclause = (OpExpr *) clause; Oid opno = opclause->opno; - if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo)) + if (treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo)) { /* Estimate selectivity for a join clause. */ s1 = join_selectivity(root, opno, @@ -896,7 +896,7 @@ clause_selectivity_ext(PlannerInfo *root, funcclause->funcid, funcclause->args, funcclause->inputcollid, - treat_as_join_clause(clause, rinfo, + treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo), varRelid, jointype, @@ -907,7 +907,7 @@ clause_selectivity_ext(PlannerInfo *root, /* Use node specific selectivity calculation function */ s1 = scalararraysel(root, (ScalarArrayOpExpr *) clause, - treat_as_join_clause(clause, rinfo, + treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo), varRelid, jointype, diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 380336518f..aab06c7d21 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1858,7 +1858,7 @@ cost_incremental_sort(Path *path, * Check if the expression contains Var with "varno 0" so that we * don't call estimate_num_groups in that case. */ - if (bms_is_member(0, pull_varnos((Node *) member->em_expr))) + if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr))) { unknown_varno = true; break; diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index ab6eaaead1..0188c1e9a1 100644 ---
Re: [HACKERS] [PATCH] Generic type subscripting
> On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote: > > Thanks, I need to remember to not skipp doc building for testing process > > even for such small changes. Hope now I didn't forget anything. > > > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > > added hints. > > > > Great! I've applied almost all of it, except: > > > > + A jsonb value will accept assignments to nonexistent > > subscript > > + paths as long as the nonexistent elements being traversed are all > > arrays. > > > > Maybe I've misunderstood the intention, but there is no requirement > > about arrays for creating such an empty path. I've formulated it as: > > > > + A jsonb value will accept assignments to nonexistent > > subscript > > + paths as long as the last existing path key is an object or an array. > > My intention there was to highlight the difference between: > > * SET obj['a']['b']['c'] = '"newvalue"' > * SET arr[0][0][3] = '"newvalue"' > > obj has to conform to {"a": {"b": {...}}} in order to receive the > assignment of the nested c. If it doesn't, that's the error case we > discussed earlier. But arr can be null, [], and so on, and any missing > structure [[[null, null, null, "newvalue"]]] will be created. If arr is 'null', or any other scalar value, such subscripting will work only one level deep because they represented internally as an array of one element. If arr is '[]' the path will comply by definition. So it's essentially the same as for objects with no particular difference. If such a quirk about scalars being treated like arrays is bothering, we could also bend it in this case as well (see the attached version). >From a4037c651a0cfd2448f38b6c8c932b5a09136b0a Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Fri, 18 Dec 2020 17:19:51 +0100 Subject: [PATCH v48 1/3] Subscripting for jsonb Subscripting implementation for jsonb. It does not support slices, does not have a limit for number of subscripts and for assignment expects a replace value to be of jsonb type. There is also one functional difference in assignment via subscripting from jsonb_set, when an original jsonb container is NULL, subscripting replaces it with an empty jsonb and proceed with assignment. For the sake of code reuse, some parts of jsonb functionality were rearranged to allow use the same functions for jsonb_set and assign subscripting operation. The original idea belongs to Oleg Bartunov. Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay --- doc/src/sgml/json.sgml | 51 src/backend/utils/adt/Makefile | 1 + src/backend/utils/adt/jsonb_util.c | 76 - src/backend/utils/adt/jsonbsubs.c | 413 src/backend/utils/adt/jsonfuncs.c | 180 ++-- src/include/catalog/pg_proc.dat | 4 + src/include/catalog/pg_type.dat | 3 +- src/include/utils/jsonb.h | 6 +- src/test/regress/expected/jsonb.out | 272 +- src/test/regress/sql/jsonb.sql | 84 +- 10 files changed, 985 insertions(+), 105 deletions(-) create mode 100644 src/backend/utils/adt/jsonbsubs.c diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index 5b9a5557a4..3ace5e444b 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -602,6 +602,57 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu + + jsonb Subscripting + + The jsonb data type supports array-style subscripting expressions + to extract and modify elements. Nested values can be indicated by chaining + subscripting expressions, following the same rules as the path + argument in the jsonb_set function. If a jsonb + value is an array, numeric subscripts start at zero, and negative integers count + backwards from the last element of the array. Slice expressions are not supported. + The result of a subscripting expression is always of the jsonb data type. + + + + An example of subscripting syntax: + + +-- Extract object value by key +SELECT ('{"a": 1}'::jsonb)['a']; + +-- Extract nested object value by key path +SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c']; + +-- Extract array element by index +SELECT ('[1, "2", null]'::jsonb)[1]; + +-- Update object value by key. Note the quotes around '1': the assigned +-- value must be of the jsonb type as well +UPDATE table_name SET jsonb_field['key'] = '1'; + +-- Filter records using a WHERE clause with subscripting. Since the result of +-- subscripting is jsonb, the value we compare it against must also be jsonb. +-- The double quotes make "value" also a valid jsonb string. +SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"'; + + + jsonb assignment via subscripting handles a few edge cases + differently from jsonb_set. When a source jsonb + is NULL,
Re: Allow matching whole DN from a client certificate
On Mon, 2021-01-18 at 11:23 +0100, Daniel Gustafsson wrote: > + /* use commas instead of slashes */ > + X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_SEP_COMMA_PLUS); > I don't have strong opinions on whether we shold use slashes or commas, but I > think it needs to be documented that commas are required since slashes is the > more common way to print a DN. There's also a XN_FLAG_RFC2253 flag, which claims to print in an RFC- compatible escape system. (It includes XN_FLAG_SEP_COMMA_PLUS.) I think NSS uses a similar RFC-based escape scheme, but I haven't checked to see whether it's fully compatible. I think you'll want to be careful to specify the format as much as possible, both to make sure that other backend TLS implementations can actually use the same escaping system and to ensure that user regexes don't suddenly start matching different things at some point in the future. As a cautionary tale, nginx is stuck with two versions of their similar feature (ssl_client_s_dn_legacy vs ssl_client_s_dn) after their switch to an RFC-compatible system [1]. Even when using RFC 2253 (now 4514) escaping, there are things left to the implementation, such as the order of AVAs inside multivalued RDNs. The RFC warns not to consider these representations to be canonical forms, so it's possible that switching (or upgrading) the TLS library in use could force users to change their regexes at some point in the future. I'm going to test this patch with some UTF-8 DNs later today; I'll share my findings. I'm also going to read up on [2] a bit more. --Jacob [1] https://serverfault.com/questions/829754/why-did-the-format-of-nginx-ssl-client-i-dn-suddenly-change [2] https://frasertweedale.github.io/blog-redhat/posts/2019-05-28-a-dn-is-not-a-string.html
catalogs.sgml documentation ambiguity
Some catalog tables have references to pg_attribute.attnum. In the documentation, it only says "(references pg_attribute.attnum)" but not which oid column to include in the two-column "foreign key". This would not be a problem if there would only be one reference to pg_class.oid, but some catalog tables have multiple columns that references pg_class.oid. For instance, pg_constraint has two columns (conkey, confkey) referencing pg_attribute, and three columns (conrelid, conindid, confrelid) referencing pg_class. A user might wonder: - Which one of these three columns should be used in combination with the conkey/confkey elements to join pg_attribute? If we would have array foreign key support, I would guess the "foreign keys" should be: FOREIGN KEY (confrelid, EACH ELEMENT OF confkey) REFERENCES pg_catalog.pg_attribute (attrelid, attnum) FOREIGN KEY (conrelid, EACH ELEMENT OF conkey) REFERENCES pg_catalog.pg_attribute (attrelid, attnum) It's of course harder to guess for a machine though, which would need a separate human-produced lookup-table. Could it be meaningful to clarify these multi-key relations in the documentation? As a bonus, machines could then parse the information out of catalogs.sgml. Here is a list of catalogs referencing pg_attribute and with multiple pg_class references: table_name | array_agg --+--- pg_constraint| {confrelid,conindid,conrelid} pg_index | {indexrelid,indrelid} pg_partitioned_table | {partdefid,partrelid} pg_trigger | {tgconstrindid,tgconstrrelid,tgrelid} (4 rows) Produced using query: SELECT b.table_name, array_agg(DISTINCT b.column_name) FROM pit.oid_joins AS a JOIN pit.oid_joins AS b ON b.table_name = a.table_name WHERE a.ref_table_name = 'pg_attribute' AND b.ref_table_name = 'pg_class' GROUP BY b.table_name HAVING cardinality(array_agg(DISTINCT b.column_name)) > 1 ;
Re: strange error reporting
Alvaro Herrera writes: > On 2021-Jan-20, Robert Haas wrote: >> I figured it was something like that. I don't know whether the right >> thing is to use something like PQdb() to get the correct database >> name, or whether we should go with Tom's suggestion and omit that >> detail altogether, but I think showing the empty string when the user >> relied on the default is too confusing. > Well, the patch seems small enough, and I don't think it'll be in any > way helpful to omit that detail. I'm +1 for applying and back-patching that. I still think we might want to just drop the phrase altogether in HEAD, but we wouldn't do that in the back branches, and the message is surely misleading as-is. regards, tom lane
Re: Deleting older versions in unique indexes to avoid page splits
On Wed, Jan 20, 2021 at 5:33 AM Amit Kapila wrote: > > Victor independently came up with a benchmark that ran over several > > hours, with cleanup consistently held back by ~5 minutes by a long > > running transaction: > > > > AFAICS, the long-running transaction used in the test is below: > SELECT abalance, pg_sleep(300) FROM pgbench_accounts WHERE mtime > > now() - INTERVAL '15min' ORDER BY aid LIMIT 1; > > This shouldn't generate a transaction id so would it be sufficient to > hold back the clean-up? It will hold back clean-up because it holds open a snapshot. Whether or not the long running transaction has been allocated a true XID (not just a VXID) is irrelevant. Victor's test case is perfectly valid. In general there are significant benefits for cases with long-running transactions, which will be quite apparent if you do something simple like run pgbench (a script with non-HOT updates) while a REPEATABLE READ transaction runs in psql (and has actually acquired a snapshot by running a simple query -- the xact snapshot is acquired lazily). I understand that this will be surprising if you believe that the problem in these cases is strictly that there are too many "recently dead" versions that still need to be stored (i.e. versions that cleanup simply isn't able to remove, given the xmin horizon invariant). But it's now clear that that's not what really happens in most cases with a long running transaction. What we actually see is that local page-level inefficiencies in cleanup were (and perhaps to some degree still are) a much bigger problem than the inherent inability of cleanup to remove even one or two tuples. This is at least true until the bloat problem becomes a full-blown disaster (because cleanup really is inherently restricted by the global xmin horizon, and therefore hopelessly behind). In reality there are seldom that many individual logical rows that get updated more than a few times in (say) any given one hour period. This is true even with skewed updates -- the skew is almost never visible at the level of an individual leaf page. The behavior we see now that we have bottom-up index deletion is much closer to the true optimal behavior for the general approach Postgres takes to cleanup of garbage tuples, since it tends to make the number of versions for any given logical row as low as possible (within the confines of the global xmin horizon limitations for cleanup). Of course it would also be helpful to have something like zheap -- some mechanism that can store "recently dead" versions some place where they at least don't bloat up the main relation structures. But that's only one part of the big picture for Postgres MVCC garbage. We should not store garbage tuples (i.e. those that are dead rather than just recently dead) *anywhere*. > First, it is not clear to me if that has properly simulated the > long-running test but even if it is what I intend to say was to have > an open long-running transaction possibly for the entire duration of > the test? If we do that, we will come to know if there is any overhead > and if so how much? I am confident that you are completely wrong about regressing cases with long-running transactions, except perhaps in some very narrow sense that is of little practical relevance. Victor's test case did result in a small loss of throughput, for example, but that's a small price to pay to not have your indexes explode (note also that most of the indexes weren't even used for reads, so in the real world it would probably also improve throughput even in the short term). FWIW the small drop in TPS probably had little to do with the cost of visiting the heap for visibility information. Workloads must be made to "live within their means". You can call that a regression if you like, but I think that almost anybody else would take issue with that characterization. Slowing down non-HOT updaters in these extreme cases may actually be a good thing, even when bottom-up deletion finally becomes ineffective. It can be thought of as backpressure. I am not worried about slowing down something that is already hopelessly inefficient and unsustainable. I'd go even further than that, in fact -- I now wonder if we should *deliberately* slow them down some more! > Test with 2 un-modified indexes > === > create table t1(c1 int, c2 int, c3 int, c4 char(10)); > create index idx_t1 on t1(c1); > create index idx_t2 on t1(c2); > create index idx_t3 on t1(c3); > > insert into t1 values(generate_series(1,500), 1, 10, 'aa'); > update t1 set c2 = 2; > postgres=# update t1 set c2 = 2; > UPDATE 500 > Time: 46533.530 ms (00:46.534) > > With HEAD > == > postgres=# update t1 set c2 = 2; > UPDATE 500 > Time: 52529.839 ms (00:52.530) > > I have dropped and recreated the table after each update in the test. Good thing that you remembered to drop and recreate the table, since otherwise bottom-up index deletion would look really good! Besides,
Re: strange error reporting
On 2021-Jan-20, Robert Haas wrote: > On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera > wrote: > > That's because pgbench reports the input argument dbname, but since you > > didn't specify anything, then PQconnectdbParams() uses the libpq > > behavior. I think we'd have to use PQdb() instead. > > I figured it was something like that. I don't know whether the right > thing is to use something like PQdb() to get the correct database > name, or whether we should go with Tom's suggestion and omit that > detail altogether, but I think showing the empty string when the user > relied on the default is too confusing. Well, the patch seems small enough, and I don't think it'll be in any way helpful to omit that detail. -- Álvaro Herrera39°49'30"S 73°17'W "Having your biases confirmed independently is how scientific progress is made, and hence made our great society what it is today" (Mary Gardiner) diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index f7da3e1f62..30fde9e9ec 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -1226,7 +1226,7 @@ doConnect(void) if (PQstatus(conn) == CONNECTION_BAD) { pg_log_error("connection to database \"%s\" failed: %s", - dbName, PQerrorMessage(conn)); + PQdb(conn), PQerrorMessage(conn)); PQfinish(conn); return NULL; }
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2021-Jan-20, Alexey Kondratov wrote: > On 2021-01-20 21:08, Alexey Kondratov wrote: > > > > I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New > > function SetRelTablesapce() is placed into the tablecmds.c. Following > > 0002 gets use of it. Is it close to what you and Michael suggested? > > Ugh, forgot to attach the patches. Here they are. Yeah, looks reasonable. > + /* No work if no change in tablespace. */ > + oldTablespaceOid = rd_rel->reltablespace; > + if (tablespaceOid != oldTablespaceOid || > + (tablespaceOid == MyDatabaseTableSpace && > OidIsValid(oldTablespaceOid))) > + { > + /* Update the pg_class row. */ > + rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) > ? > + InvalidOid : tablespaceOid; > + CatalogTupleUpdate(pg_class, >t_self, tuple); > + > + changed = true; > + } > + > + if (changed) > + /* Record dependency on tablespace */ > + changeDependencyOnTablespace(RelationRelationId, > + > reloid, rd_rel->reltablespace); Why have a separate "if (changed)" block here instead of merging with the above? -- Álvaro Herrera39°49'30"S 73°17'W
Re: strange error reporting
On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera wrote: > That's because pgbench reports the input argument dbname, but since you > didn't specify anything, then PQconnectdbParams() uses the libpq > behavior. I think we'd have to use PQdb() instead. I figured it was something like that. I don't know whether the right thing is to use something like PQdb() to get the correct database name, or whether we should go with Tom's suggestion and omit that detail altogether, but I think showing the empty string when the user relied on the default is too confusing. -- Robert Haas EDB: http://www.enterprisedb.com
Re: strange error reporting
On 2021-Jan-20, Robert Haas wrote: > On Wed, Jan 20, 2021 at 12:19 PM Tom Lane wrote: > > Robert Haas writes: > > > I just made the mistake of trying to run pgbench without first running > > > createdb and got this: > > > > > pgbench: error: connection to database "" failed: could not connect to > > > socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not exist > > > > > This looks pretty bogus because (1) I was not attempting to connect to > > > a database whose name is the empty string [...] > > > > I'm not sure about the empty DB name in the first part (presumably > > that's from pgbench, so what was your pgbench command exactly?). > > I think it was just 'pgbench -i 40'. For sure, I didn't specify a database > name. That's because pgbench reports the input argument dbname, but since you didn't specify anything, then PQconnectdbParams() uses the libpq behavior. I think we'd have to use PQdb() instead. -- Álvaro Herrera Valdivia, Chile
Re: Jsonpath ** vs lax mode
On 2021-Jan-20, Alexander Korotkov wrote: > My proposal is to make everything after the ** operator use strict mode > (patch attached). I think this shouldn't be backpatched, just applied to > the v14. Other suggestions? I think changing the mode midway through the operation is strange. What do you think of requiring for ** that mode is strict? That is, if ** is used and the mode is lax, an error is thrown. Thanks -- Álvaro Herrera Valdivia, Chile
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2021-01-20 21:08, Alexey Kondratov wrote: On 2021-01-20 18:54, Alvaro Herrera wrote: On 2021-Jan-20, Alvaro Herrera wrote: On 2021-Jan-20, Michael Paquier wrote: > +/* > + * This is mostly duplicating ATExecSetTableSpaceNoStorage, > + * which should maybe be factored out to a library function. > + */ > Wouldn't it be better to do first the refactoring of 0002 and then > 0001 so as REINDEX can use the new routine, instead of putting that > into a comment? I think merging 0001 and 0002 into a single commit is a reasonable approach. ... except it doesn't make a lot of sense to have set_rel_tablespace in either indexcmds.c or index.c. I think tablecmds.c is a better place for it. (I would have thought catalog/storage.c, but that one's not the right abstraction level it seems.) I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New function SetRelTablesapce() is placed into the tablecmds.c. Following 0002 gets use of it. Is it close to what you and Michael suggested? Ugh, forgot to attach the patches. Here they are. -- AlexeyFrom 2c3876f99bc8ebdd07c532619992e7ec3093e50a Mon Sep 17 00:00:00 2001 From: Alexey Kondratov Date: Mon, 23 Mar 2020 21:10:29 +0300 Subject: [PATCH v2 2/2] Allow REINDEX to change tablespace REINDEX already does full relation rewrite, this patch adds a possibility to specify a new tablespace where new relfilenode will be created. --- doc/src/sgml/ref/reindex.sgml | 22 + src/backend/catalog/index.c | 72 ++- src/backend/commands/indexcmds.c | 68 ++- src/bin/psql/tab-complete.c | 4 +- src/include/catalog/index.h | 2 + src/test/regress/input/tablespace.source | 53 +++ src/test/regress/output/tablespace.source | 102 ++ 7 files changed, 318 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml index 627b36300c..4f84060c4d 100644 --- a/doc/src/sgml/ref/reindex.sgml +++ b/doc/src/sgml/ref/reindex.sgml @@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN CONCURRENTLY [ boolean ] VERBOSE [ boolean ] +TABLESPACE new_tablespace @@ -187,6 +188,19 @@ REINDEX [ ( option [, ...] ) ] { IN + +TABLESPACE + + + This specifies that indexes will be rebuilt on a new tablespace. + Cannot be used with "mapped" relations. If SCHEMA, + DATABASE or SYSTEM is specified, then + all unsuitable relations will be skipped and a single WARNING + will be generated. + + + + VERBOSE @@ -210,6 +224,14 @@ REINDEX [ ( option [, ...] ) ] { IN + +new_tablespace + + + The tablespace where indexes will be rebuilt. + + + diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index b8cd35e995..ed98b17483 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -57,6 +57,7 @@ #include "commands/event_trigger.h" #include "commands/progress.h" #include "commands/tablecmds.h" +#include "commands/tablespace.h" #include "commands/trigger.h" #include "executor/executor.h" #include "miscadmin.h" @@ -1394,9 +1395,13 @@ index_update_collation_versions(Oid relid, Oid coll) * Create concurrently an index based on the definition of the one provided by * caller. The index is inserted into catalogs and needs to be built later * on. This is called during concurrent reindex processing. + * + * "tablespaceOid" is the new tablespace to use for this index. If + * InvalidOid, use the tablespace in-use instead. */ Oid -index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName) +index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, + Oid tablespaceOid, const char *newName) { Relation indexRelation; IndexInfo *oldInfo, @@ -1526,7 +1531,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char newInfo, indexColNames, indexRelation->rd_rel->relam, - indexRelation->rd_rel->reltablespace, + OidIsValid(tablespaceOid) ? +tablespaceOid : indexRelation->rd_rel->reltablespace, indexRelation->rd_indcollation, indclass->values, indcoloptions->values, @@ -3591,6 +3597,8 @@ IndexGetRelation(Oid indexId, bool missing_ok) /* * reindex_index - This routine is used to recreate a single index + * + * See comments of reindex_relation() for details about "tablespaceOid". */ void reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, @@ -3603,6 +3611,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence, volatile bool skipped_constraint = false; PGRUsage ru0; bool progress = ((params->options & REINDEXOPT_REPORT_PROGRESS) != 0); + bool set_tablespace = OidIsValid(params->tablespaceOid); pg_rusage_init();
Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
On 2021-01-20 18:54, Alvaro Herrera wrote: On 2021-Jan-20, Alvaro Herrera wrote: On 2021-Jan-20, Michael Paquier wrote: > +/* > + * This is mostly duplicating ATExecSetTableSpaceNoStorage, > + * which should maybe be factored out to a library function. > + */ > Wouldn't it be better to do first the refactoring of 0002 and then > 0001 so as REINDEX can use the new routine, instead of putting that > into a comment? I think merging 0001 and 0002 into a single commit is a reasonable approach. ... except it doesn't make a lot of sense to have set_rel_tablespace in either indexcmds.c or index.c. I think tablecmds.c is a better place for it. (I would have thought catalog/storage.c, but that one's not the right abstraction level it seems.) I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New function SetRelTablesapce() is placed into the tablecmds.c. Following 0002 gets use of it. Is it close to what you and Michael suggested? But surely ATExecSetTableSpaceNoStorage should be using this new routine. (I first thought 0002 was doing that, since that commit is calling itself a "refactoring", but now that I look closer, it's not.) Yeah, this 'refactoring' was initially referring to refactoring of what Justin added to one of the previous 0001. And it was meant to be merged with 0001, once agreed, but we got distracted by other stuff. I have not yet addressed Michael's concerns regarding reindex of partitions. I am going to look closer on it tomorrow. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Re: strange error reporting
On Wed, Jan 20, 2021 at 12:47 PM Tom Lane wrote: > Fair. One possibility, which'd take a few more cycles in libpq but > likely not anything significant, is to replace "could not connect to ..." > with "while connecting to ..." once we're past the connect() per se. Yeah. I think this is kind of a client-side version of errcontext(), except we don't really have that context formally, so we're trying to figure out how to fake it in specific cases. > > Maybe it would be better if it said: > > > connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL: > > database "rhaas" does not exist > > I'd be inclined to spell it "connection to server at ... failed", > but that sort of wording is surely also possible. "connection to server" rather than "connection to database" works for me; in fact, I think I like it slightly better. -- Robert Haas EDB: http://www.enterprisedb.com
Re: strange error reporting
Robert Haas writes: > On Wed, Jan 20, 2021 at 12:19 PM Tom Lane wrote: >> But the 'could not connect to socket' part is a consequence of my >> recent fiddling with libpq's connection failure reporting, see >> 52a10224e. We could discuss exactly how that ought to be spelled, >> but the idea is to consistently identify the host that we were trying >> to connect to. If you have a multi-host connection string, it's >> conceivable that "rhaas" exists on some of those hosts and not others, >> so I do not think the info is irrelevant. > I'm not saying that which socket I used is totally irrelevant although > in most cases it's going to be a lot of detail. I'm just saying that, > at least for me, when you say you can't connect to a socket, I at > least think about the return value of connect(2), which was clearly 0 > here. Fair. One possibility, which'd take a few more cycles in libpq but likely not anything significant, is to replace "could not connect to ..." with "while connecting to ..." once we're past the connect() per se. > Maybe it would be better if it said: > connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL: > database "rhaas" does not exist I'd be inclined to spell it "connection to server at ... failed", but that sort of wording is surely also possible. regards, tom lane
Re: Getting column names/types from select query?
> Where do you need this information? I'm writing some code that takes a given query, and generates type-safe bindings for it, so people can write SQL queries and get structs (or vectors of structs) out the other end. So I'm pretty flexible about where I get it, given that it'll be part of my build/codegen process. I hadn't seen libpq yet, I'll look into that — thanks!
Re: strange error reporting
On Wed, Jan 20, 2021 at 12:19 PM Tom Lane wrote: > Robert Haas writes: > > I just made the mistake of trying to run pgbench without first running > > createdb and got this: > > > pgbench: error: connection to database "" failed: could not connect to > > socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not exist > > > This looks pretty bogus because (1) I was not attempting to connect to > > a database whose name is the empty string and (2) saying that it > > couldn't connect to the socket is wrong, else it would not also be > > showing a server message. > > I'm not sure about the empty DB name in the first part (presumably > that's from pgbench, so what was your pgbench command exactly?). I think it was just 'pgbench -i 40'. For sure, I didn't specify a database name. > But the 'could not connect to socket' part is a consequence of my > recent fiddling with libpq's connection failure reporting, see > 52a10224e. We could discuss exactly how that ought to be spelled, > but the idea is to consistently identify the host that we were trying > to connect to. If you have a multi-host connection string, it's > conceivable that "rhaas" exists on some of those hosts and not others, > so I do not think the info is irrelevant. I'm not saying that which socket I used is totally irrelevant although in most cases it's going to be a lot of detail. I'm just saying that, at least for me, when you say you can't connect to a socket, I at least think about the return value of connect(2), which was clearly 0 here. > Just looking at this, I wonder if we ought to drop pgbench's > contribution to the message entirely; it seems like libpq's > message is now fairly freestanding. Maybe it would be better if it said: connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL: database "rhaas" does not exist -- Robert Haas EDB: http://www.enterprisedb.com
Re: Phrase search vs. multi-lexeme tokens
On Thu, Jan 7, 2021 at 6:36 AM Alexander Korotkov wrote: > > > I read your patch over quickly and it seems like a reasonable > > approach (but sadly underdocumented). Can we extend the idea > > to fix the to_tsquery case? > > Sure, I'll provide a revised patch. The next version of the patch is attached. Now, it just makes to_tsquery() and websearch_to_tsquery() use phrase operator to connect multiple lexemes of the same tsquery token. I leave plainto_tsquery() aside because it considers the whole argument as a single token. Changing it would make it an equivalent of phraseto_tsquery(). -- Regards, Alexander Korotkov tsquery_phrase_fix.path Description: Binary data
Re: strange error reporting
Robert Haas writes: > I just made the mistake of trying to run pgbench without first running > createdb and got this: > pgbench: error: connection to database "" failed: could not connect to > socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not exist > This looks pretty bogus because (1) I was not attempting to connect to > a database whose name is the empty string and (2) saying that it > couldn't connect to the socket is wrong, else it would not also be > showing a server message. I'm not sure about the empty DB name in the first part (presumably that's from pgbench, so what was your pgbench command exactly?). But the 'could not connect to socket' part is a consequence of my recent fiddling with libpq's connection failure reporting, see 52a10224e. We could discuss exactly how that ought to be spelled, but the idea is to consistently identify the host that we were trying to connect to. If you have a multi-host connection string, it's conceivable that "rhaas" exists on some of those hosts and not others, so I do not think the info is irrelevant. Just looking at this, I wonder if we ought to drop pgbench's contribution to the message entirely; it seems like libpq's message is now fairly freestanding. regards, tom lane
Jsonpath ** vs lax mode
Hi! We have a bug report which says that jsonpath ** operator behaves strangely in the lax mode [1]. Naturally, the result of this query looks counter-intuitive. # select jsonb_path_query_array('[{"a": 1, "b": [{"a": 2}]}]', 'lax $.**.a'); jsonb_path_query_array [1, 1, 2, 2] (1 row) But actually, everything works as designed. ** operator reports both objects and wrapping arrays, while object key accessor automatically unwraps arrays. # select x, jsonb_path_query_array(x, '$.a') from jsonb_path_query('[{"a": 1, "b": [{"a": 2}]}]', 'lax $.**') x; x | jsonb_path_query_array -+ [{"a": 1, "b": [{"a": 2}]}] | [1] {"a": 1, "b": [{"a": 2}]} | [1] 1 | [] [{"a": 2}] | [2] {"a": 2}| [2] 2 | [] (6 rows) At first sight, we may just say that lax mode just sucks and counter-intuitive results are expected. But at the second sight, the lax mode is used by default and current behavior may look too surprising. My proposal is to make everything after the ** operator use strict mode (patch attached). I think this shouldn't be backpatched, just applied to the v14. Other suggestions? Links 1. https://www.postgresql.org/message-id/16828-2b0229babfad2d8c%40postgresql.org -- Regards, Alexander Korotkov jsonpath_double_star_strict_mode.patch Description: Binary data
strange error reporting
I just made the mistake of trying to run pgbench without first running createdb and got this: pgbench: error: connection to database "" failed: could not connect to socket "/tmp/.s.PGSQL.5432": FATAL: database "rhaas" does not exist This looks pretty bogus because (1) I was not attempting to connect to a database whose name is the empty string and (2) saying that it couldn't connect to the socket is wrong, else it would not also be showing a server message. I haven't investigated why this is happening; apologies if this is a known issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Odd, intermittent failure in contrib/pageinspect
Michael Paquier writes: > On Tue, Jan 19, 2021 at 05:03:49PM -0500, Tom Lane wrote: >> In short I propose the attached patch, which also gets rid of >> that duplicate query. > Agreed, +1. Pushed, thanks for looking at it. regards, tom lane
Re: Getting column names/types from select query?
"Wesley Aptekar-Cassels" writes: > I am interested in figuring out how to get the names and types of the > columns from an arbitrary query. Where do you need this information? Usually the easiest way is to prepare (plan) the query and then extract metadata, for instance PQprepare and PQdescribePrepared if using libpq. regards, tom lane
Re: [HACKERS] [PATCH] Generic type subscripting
On Wed Jan 20, 2021 at 11:22 AM EST, Dmitry Dolgov wrote: > > On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote: > > > > I found minor issues. > > > > Doc - missing tag > > > > and three whitespaces issues > > > > see attached patch > > Thanks, I need to remember to not skipp doc building for testing process > even for such small changes. Hope now I didn't forget anything. > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > added hints. > > Great! I've applied almost all of it, except: > > + A jsonb value will accept assignments to nonexistent > subscript > + paths as long as the nonexistent elements being traversed are all > arrays. > > Maybe I've misunderstood the intention, but there is no requirement > about arrays for creating such an empty path. I've formulated it as: > > + A jsonb value will accept assignments to nonexistent > subscript > + paths as long as the last existing path key is an object or an array. My intention there was to highlight the difference between: * SET obj['a']['b']['c'] = '"newvalue"' * SET arr[0][0][3] = '"newvalue"' obj has to conform to {"a": {"b": {...}}} in order to receive the assignment of the nested c. If it doesn't, that's the error case we discussed earlier. But arr can be null, [], and so on, and any missing structure [[[null, null, null, "newvalue"]]] will be created. Take 2: A jsonb value will accept assignments to nonexistent subscript paths as long as object key subscripts can be traversed as described above. The final subscript is not traversed and, if it describes a missing object key, will be created. Nested arrays will always be created and NULL-padded according to the path until the value can be placed appropriately.
Getting column names/types from select query?
Hi all, I am interested in figuring out how to get the names and types of the columns from an arbitrary query. Essentially, I want to be able to take a query like: CREATE TABLE foo( bar bigserial, baz varchar(256) ); SELECT * FROM foo WHERE bar = 42; and figure out programmatically that the select will return a column "bar" of type bigserial, and a column "foo" of type varchar(256). I would like this to work for more complex queries as well (joins, CTEs, etc). I've found https://wiki.postgresql.org/wiki/Query_Parsing, which talks about related ways to hook into postgres, but that seems to only talk about the parse tree — a lot more detail and processing seems to be required in order to figure out the output types. It seems like there should be somewhere I can hook into in postgres that will get me this information, but I have no familiarity with the codebase, so I don't know the best way to get this. How would you recommend that I approach this? I'm comfortable patching postgres if needed, although if there's a solution that doesn't require that, I'd prefer that. Thanks, :w