Decouple last important WAL record LSN from WAL insert locks
Hi, While working on something else, I noticed that each WAL insert lock tracks its own last important WAL record's LSN (lastImportantAt) and both the bgwriter and checkpointer later computes the max value/server-wide last important WAL record's LSN via GetLastImportantRecPtr(). While doing so, each WAL insertion lock is acquired in exclusive mode in a for loop. This seems like too much overhead to me. I quickly coded a patch (attached herewith) that tracks the server-wide last important WAL record's LSN in XLogCtlInsert (lastImportantPos) protected with a spinlock and gets rid of lastImportantAt from each WAL insert lock. I ran pgbench with a simple insert [1] and the results are below. While the test was run, the GetLastImportantRecPtr() was called 4-5 times. # of clientsHEADPATCHED 18382 2159157 4303302 8576570 1611041095 3220552041 6422862295 12822702285 25623022253 51222052290 76822242180 102421092150 204819411936 409618561848 It doesn't seem to hurt (for this use-case) anyone, however there might be some benefit if bgwriter and checkpointer come in the way of WAL inserters. With the patch, the extra exclusive lock burden on WAL insert locks is gone. Since the amount of work the WAL inserters do under the new spinlock is very minimal (updating XLogCtlInsert->lastImportantPos), it may not be an issue. Also, it's worthwhile to look at the existing comment [2], which doesn't talk about the performance impact of having a lock. Thoughts? [1] ./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j 8 install > install.log 2>&1 & cd inst/bin ./pg_ctl -D data -l logfile stop rm -rf data logfile insert.sql free -m sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches' free -m ./initdb -D data ./pg_ctl -D data -l logfile start ./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "32GB";' ./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";' ./psql -d postgres -c 'ALTER SYSTEM SET bgwriter_delay = "10ms";' ./pg_ctl -D data -l logfile restart ./pgbench -i -s 1 -d postgres ./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" cat << EOF >> insert.sql \set aid random(1, 10 * :scale) \set delta random(1, 10 * :scale) INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta); EOF for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n "$c ";./pgbench -n -M prepared -U ubuntu postgres -b simple-update -c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done [2] * records. Tracking the WAL activity directly in WALInsertLock has the * advantage of not needing any additional locks to update the value. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v1-0001-Decouple-last-important-WAL-record-LSN-from-WAL-i.patch Description: Binary data
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Nov 22, 2022 at 7:30 AM houzj.f...@fujitsu.com wrote: > Few minor comments and questions: 1. +static void +LogicalParallelApplyLoop(shm_mq_handle *mqh) { + for (;;) + { + void*data; + Size len; + + ProcessParallelApplyInterrupts(); ... ... + if (rc & WL_LATCH_SET) + { + ResetLatch(MyLatch); + ProcessParallelApplyInterrupts(); + } ... } Why ProcessParallelApplyInterrupts() is called twice in LogicalParallelApplyLoop()? 2. + * This scenario is similar to the first case but TX-1 and TX-2 are executed by + * two parallel apply workers (PA-1 and PA-2 respectively). In this scenario, + * PA-2 is waiting for PA-1 to complete its transaction while PA-1 is waiting + * for subsequent input from LA. Also, LA is waiting for PA-2 to complete its + * transaction in order to preserve the commit order. There is a deadlock among + * three processes. + * ... ... + * + * LA (waiting to acquire the local transaction lock) -> PA-1 (waiting to + * acquire the lock on the unique index) -> PA-2 (waiting to acquire the lock + * on the remote transaction) -> LA + * Isn't the order of PA-1 and PA-2 different in the second paragraph as compared to the first one. 3. + * Deadlock-detection + * -- It may be better to keep the title of this section as Locking Considerations. 4. In the section mentioned in Point 3, it would be better to separately explain why we need session-level locks instead of transaction level. 5. Add the below comments in the code: diff --git a/src/backend/replication/logical/applyparallelworker.c b/src/backend/replication/logical/applyparallelworker.c index 9385afb6d2..56f00defcf 100644 --- a/src/backend/replication/logical/applyparallelworker.c +++ b/src/backend/replication/logical/applyparallelworker.c @@ -431,6 +431,9 @@ pa_free_worker_info(ParallelApplyWorkerInfo *winfo) if (winfo->dsm_seg != NULL) dsm_detach(winfo->dsm_seg); + /* +* Ensure this worker information won't be reused during worker allocation. +*/ ParallelApplyWorkersList = list_delete_ptr(ParallelApplyWorkersList, winfo); @@ -762,6 +765,10 @@ HandleParallelApplyMessage(ParallelApplyWorkerInfo *winfo, StringInfo msg) */ error_context_stack = apply_error_context_stack; + /* +* The actual error must be already reported by parallel apply +* worker. +*/ ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("parallel apply worker exited abnormally"), -- With Regards, Amit Kapila.
Re: Hash index build performance tweak from sorting
On Wed, 23 Nov 2022 at 13:04, David Rowley wrote: > After getting rid of the HashInsertState code and just adding bool > sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch > is much more simple: Seems good to me and I wouldn't argue with any of your comments. > and v4 includes 7 extra lines in hashinsert.c for the Assert() I > mentioned in my previous email plus a bunch of extra comments. Oh, I did already include that in v3 as requested. > I'd rather see this solved like v4 is doing it. Please do. No further comments. Thanks for your help -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Prefetch the next tuple's memory during seqscans
On Wed, 23 Nov 2022 at 21:26, sirisha chamarthi wrote: > Master > After vacuum: > latency average = 393.880 ms > > Master + 0001 + 0005 > After vacuum: > latency average = 369.591 ms Thank you for running those again. Those results make more sense. Would you mind also testing the count(*) query too? David
Re: Hash index build performance tweak from sorting
On Fri, 18 Nov 2022 at 03:34, Tomas Vondra wrote: > I did some simple benchmark with v2 and v3, using the attached script, > which essentially just builds hash index on random data, with different > data types and maintenance_work_mem values. And what I see is this > (median of 10 runs): > So to me it seems v2 performs demonstrably better, v3 is consistently > slower - not only compared to v2, but often also to master. Could this just be down to code alignment changes? There does not really seem to be any fundamental differences which would explain this. David
Re: Hash index build performance tweak from sorting
On Wed, 16 Nov 2022 at 17:33, Simon Riggs wrote: > > Thanks for the review, apologies for the delay in acting upon your comments. > > My tests show the sorted and random tests are BOTH 4.6% faster with > the v3 changes using 5-test avg, but you'll be pleased to know your > kit is about 15.5% faster than mine, comparing absolute execution > times. Thanks for the updated patch. I started to look at this again and I'm starting to think that the HashInsertState struct is the wrong approach for passing along the sorted flag to _hash_doinsert(). The reason I think this is that in hashbuild() when setting buildstate.spool to NULL, you're also making the decision about what to set the sorted flag to. However, in reality, we already know what we should be passing *every* time we call _hash_doinsert(). The only place where we can pass the sorted option as true is in _h_indexbuild() when we're doing the sorted version of the index build. Trying to make that decision any sooner seems error-prone. I understand you have made HashInsertState so that we don't need to add new parameters should we ever need to pass something else along, but I'm just thinking that if we ever need to add more, then we should just reconsider this in the future. I think for today, the better option is just to add the bool sorted as a parameter to _hash_doinsert() and pass it as true in the single case where it's valid to do so. That seems less likely that we'll inherit some options from some other place after some future modification and end up passing sorted as true when it should be false. Another reason I didn't like the HashInsertState idea is that in the v3 patch there's an HashInsertState in both HashBuildState and HSpool. Because in the normal insert path (hashinsert), we've neither a HashBuildState nor an HSpool, you're having to fake up a HashInsertStateData to pass something along to _hash_doinsert() in hashinsert(). When we're building an index, in the non-sorted index build case, you're always passing the HashInsertStateData from the HashBuildState, but when we're doing the sorted index build the one from HSpool is passed. In other words, in each of the 3 calls to _hash_doinsert(), the HashInsertStateData comes from a different place. Now, I do see that you've coded hashbuild() so both versions of the HashInsertState point to the same HashInsertStateData, but I find it unacceptable programming that in _h_spoolinit() the code palloc's the memory for the HSpool and you're setting the istate field to the HashInsertStateData that's on the stack. That just seems like a great way to end up having istate pointing to junk should the HSpool ever live beyond the hashbuild() call. If we really don't want HSpool to live beyond hashbuild(), then it too should be a local variable to hashbuild() instead of being palloc'ed in _h_spoolinit(). _h_spoolinit() could just be passed a pointer to the HSpool to populate. After getting rid of the HashInsertState code and just adding bool sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch is much more simple: v3: src/backend/access/hash/hash.c | 19 --- src/backend/access/hash/hashinsert.c | 40 ++-- src/backend/access/hash/hashsort.c | 8 ++-- src/include/access/hash.h| 14 +++--- 4 files changed, 67 insertions(+), 14 deletions(-) v4: src/backend/access/hash/hash.c | 4 ++-- src/backend/access/hash/hashinsert.c | 40 src/backend/access/hash/hashsort.c | 3 ++- src/include/access/hash.h| 6 -- 4 files changed, 40 insertions(+), 13 deletions(-) and v4 includes 7 extra lines in hashinsert.c for the Assert() I mentioned in my previous email plus a bunch of extra comments. I'd rather see this solved like v4 is doing it. David diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c index c361509d68..77fd147f68 100644 --- a/src/backend/access/hash/hash.c +++ b/src/backend/access/hash/hash.c @@ -231,7 +231,7 @@ hashbuildCallback(Relation index, itup = index_form_tuple(RelationGetDescr(index), index_values, index_isnull); itup->t_tid = *tid; - _hash_doinsert(index, itup, buildstate->heapRel); + _hash_doinsert(index, itup, buildstate->heapRel, false); pfree(itup); } @@ -265,7 +265,7 @@ hashinsert(Relation rel, Datum *values, bool *isnull, itup = index_form_tuple(RelationGetDescr(rel), index_values, index_isnull); itup->t_tid = *ht_ctid; - _hash_doinsert(rel, itup, heapRel); + _hash_doinsert(rel, itup, heapRel, false); pfree(itup); diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c index 23907d2e5b..6718ff18f3 100644 --- a/src/backend/access/hash/hashinsert.c +++
Bug in MERGE's test for tables with rules
While playing around with rules and MERGE, I noticed that there is a bug in the way that it detects whether the target table has rules --- it uses rd_rel->relhasrules, which can be incorrect, since it might be set for a table that doesn't currently have rules, but did in the recent past. So it actually needs to examine rd_rules. Technically, I think that it would be sufficient to just test whether rd_rules is non-NULL, but I think it's more robust and readable to check rd_rules->numLocks, as in the attached patch. Regards, Dean diff --git a/src/backend/parser/parse_merge.c b/src/backend/parser/parse_merge.c new file mode 100644 index 7913523..62c2ff6 --- a/src/backend/parser/parse_merge.c +++ b/src/backend/parser/parse_merge.c @@ -182,7 +182,8 @@ transformMergeStmt(ParseState *pstate, M errmsg("cannot execute MERGE on relation \"%s\"", RelationGetRelationName(pstate->p_target_relation)), errdetail_relkind_not_supported(pstate->p_target_relation->rd_rel->relkind))); - if (pstate->p_target_relation->rd_rel->relhasrules) + if (pstate->p_target_relation->rd_rules != NULL && + pstate->p_target_relation->rd_rules->numLocks > 0) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot execute MERGE on relation \"%s\"", diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out new file mode 100644 index 624d0e5..ad1a8c9 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -3532,6 +3532,30 @@ MERGE INTO rule_merge2 t USING (SELECT 1 DELETE WHEN NOT MATCHED THEN INSERT VALUES (s.a, ''); +SELECT * FROM rule_merge2; + a | b +---+--- + 1 | +(1 row) + +-- and should be ok after dropping rules +DROP RULE rule1 ON rule_merge1; +DROP RULE rule2 ON rule_merge1; +DROP RULE rule3 ON rule_merge1; +MERGE INTO rule_merge1 t USING (SELECT 1 AS a) s + ON t.a = s.a + WHEN MATCHED AND t.a < 2 THEN + UPDATE SET b = b || ' updated by merge, had rules' + WHEN MATCHED AND t.a > 2 THEN + DELETE + WHEN NOT MATCHED THEN + INSERT VALUES (s.a, 'had rules'); +SELECT * FROM rule_merge1; + a | b +---+--- + 1 | had rules +(1 row) + -- -- Test enabling/disabling -- diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql new file mode 100644 index bfb5f3b..8a728be --- a/src/test/regress/sql/rules.sql +++ b/src/test/regress/sql/rules.sql @@ -1264,6 +1264,21 @@ MERGE INTO rule_merge2 t USING (SELECT 1 DELETE WHEN NOT MATCHED THEN INSERT VALUES (s.a, ''); +SELECT * FROM rule_merge2; + +-- and should be ok after dropping rules +DROP RULE rule1 ON rule_merge1; +DROP RULE rule2 ON rule_merge1; +DROP RULE rule3 ON rule_merge1; +MERGE INTO rule_merge1 t USING (SELECT 1 AS a) s + ON t.a = s.a + WHEN MATCHED AND t.a < 2 THEN + UPDATE SET b = b || ' updated by merge, had rules' + WHEN MATCHED AND t.a > 2 THEN + DELETE + WHEN NOT MATCHED THEN + INSERT VALUES (s.a, 'had rules'); +SELECT * FROM rule_merge1; -- -- Test enabling/disabling
Another multi-row VALUES bug
I was thinking some more about the recent fix to multi-row VALUES handling in the rewriter (b8f2687fdc), and I realised that there is another bug in the way DEFAULT values are handled: In RewriteQuery(), the code assumes that in a multi-row INSERT query, the VALUES RTE will be the only thing in the query's fromlist. That's true for the original query, but it's not necessarily the case for product queries, if the rule action performs a multi-row insert, leading to a new VALUES RTE that the DEFAULT-handling code might fail to process. For example: CREATE TABLE foo(a int); INSERT INTO foo VALUES (1); CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text); CREATE RULE foo_r AS ON UPDATE TO foo DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'), (DEFAULT, new.a, 'new'); UPDATE foo SET a = 2 WHERE a = 1; ERROR: unrecognized node type: 43 There's a similar example to this in the regression tests, but it doesn't test DEFAULT-handling. It's also possible for the current code to cause the same VALUES RTE to be rewritten multiple times, when recursing into product queries (if the rule action doesn't add any more stuff to the query's fromlist). That turns out to be harmless, because the second time round it will no longer contain any defaults, but it's technically incorrect, and certainly a waste of cycles. So I think what the code needs to do is examine the targetlist, and identify the VALUES RTE that the current query is using as a source, and rewrite just that RTE (so any original VALUES RTE is rewritten at the top level, and any VALUES RTEs from rule actions are rewritten while recursing, and none are rewritten more than once), as in the attached patch. While at it, I noticed an XXX code comment questioning whether any of this applies to MERGE. The answer is "no", because MERGE actions don't allow multi-row inserts, so I think it's worth updating that comment to make that clearer. Regards, Dean diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c new file mode 100644 index fb0c687..5ec4b91 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -3743,26 +3743,34 @@ RewriteQuery(Query *parsetree, List *rew */ if (event == CMD_INSERT) { + ListCell *lc2; RangeTblEntry *values_rte = NULL; /* * If it's an INSERT ... VALUES (...), (...), ... there will be a - * single RTE for the VALUES targetlists. + * unique VALUES RTE referred to by the targetlist. At the top + * level, this will be the VALUES lists from the original query, + * and while recursively processing each product query, it will be + * the VALUES lists from the rule action, if any. As with + * rewriteValuesRTE(), we need only worry about targetlist entries + * that contain simple Vars referencing the VALUES RTE. */ - if (list_length(parsetree->jointree->fromlist) == 1) + foreach(lc2, parsetree->targetList) { -RangeTblRef *rtr = (RangeTblRef *) linitial(parsetree->jointree->fromlist); +TargetEntry *tle = (TargetEntry *) lfirst(lc2); -if (IsA(rtr, RangeTblRef)) +if (IsA(tle->expr, Var)) { - RangeTblEntry *rte = rt_fetch(rtr->rtindex, + Var *var = (Var *) tle->expr; + RangeTblEntry *rte = rt_fetch(var->varno, parsetree->rtable); if (rte->rtekind == RTE_VALUES) { values_rte = rte; - values_rte_index = rtr->rtindex; + values_rte_index = var->varno; } + break; } } @@ -3837,7 +3845,11 @@ RewriteQuery(Query *parsetree, List *rew break; case CMD_UPDATE: case CMD_INSERT: - /* XXX is it possible to have a VALUES clause? */ + + /* + * MERGE actions do not permit multi-row INSERTs, so + * there is no VALUES RTE to deal with here. + */ action->targetList = rewriteTargetListIU(action->targetList, action->commandType, diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out new file mode 100644 index 624d0e5..9c28dd9 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -2938,11 +2938,11 @@ select pg_get_viewdef('shoe'::regclass,0 -- -- check multi-row VALUES in rules -- -create table rules_src(f1 int, f2 int); -create table rules_log(f1 int, f2 int, tag text); +create table rules_src(f1 int, f2 int default 0); +create table rules_log(f1 int, f2 int, tag text, id serial); insert into rules_src values(1,2), (11,12); create rule r1 as on update to rules_src do also - insert into rules_log values(old.*, 'old'), (new.*, 'new'); + insert into rules_log values(old.*, 'old', default), (new.*, 'new', default); update rules_src set f2 = f2 + 1; update rules_src set f2 = f2 * 10; select * from rules_src; @@ -2953,16 +2953,16 @@ select * from rules_src; (2 rows) select * from rules_log; - f1 | f2 | tag -+-+- - 1 | 2
Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.
On Wed, 23 Nov 2022 at 23:13, Alex Fan wrote: > I am new to the postgres community and apologise for resending this as the > previous one didn't include patch properly and didn't cc reviewers (maybe the > reason it has been buried in mailing list for months) Welcome to the community! I've not looked at your patch, but I have noticed that you have assigned some reviewers to the CF entry yourself. Unless these people know about that, this is likely a bad choice. People usually opt to review patches of their own accord rather than because the patch author put their name on the reviewer list. There are a few reasons that the patch might not be getting much attention: 1. The CF entry ([1]) states that the patch is "Waiting on Author". If you've done what you need to do, and are waiting for review, "Needs review" might be a better state. Currently people browsing the CF app will assume you need to do more work before it's worth looking at your patch. 2. The CF entry already has reviewers listed. People looking for a patch to review are probably more likely to pick one with no reviewers listed as they'd expect the existing listed reviewers to be taking care of reviews for a particular patch. The latter might be unlikely to happen given you've assigned reviewers yourself without asking them (at least you didn't ask me after you put me on the list). 3. Target version is 17. What's the reason for that? The next version is 16. I'd recommend setting the patch to "Needs review" and removing all the reviewers that have not confirmed to you that they'll review the patch. I'd also leave the target version blank or set it to 16. There might be a bit more useful information for you in [2]. David [1] https://commitfest.postgresql.org/40/3857/ [2] https://wiki.postgresql.org/wiki/Submitting_a_Patch
Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
On 2022-Nov-23, Alvaro Herrera wrote: > I suggest that we could improve that elog() so that it includes the > members of the multixact in question, which could help us better > understand what is going on. Something like the attached. It would result in output like this: WARNING: new multixact has more than one updating member: 0 2[17378 (keysh), 17381 (nokeyupd)] Then it should be possible to trace (in pg_waldump output) the operations of each of the transactions that have any status in the multixact that includes some form of "upd". -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Just treat us the way you want to be treated + some extra allowance for ignorance."(Michael Brusser) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 204aa95045..e1191a7564 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -799,7 +799,8 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members) if (ISUPDATE_from_mxstatus(members[i].status)) { if (has_update) - elog(ERROR, "new multixact has more than one updating member"); + elog(ERROR, "new multixact has more than one updating member: %s", + mxid_to_string(InvalidMultiXactId, nmembers, members)); has_update = true; } }
Re: Perform streaming logical transactions by background workers and parallel apply
On Tue, Nov 22, 2022 at 7:23 PM Hayato Kuroda (Fujitsu) wrote: > > > 07. proto.c - logicalrep_write_stream_abort() > > We may able to add assertions for abort_lsn and abort_time, like xid and > subxid. > If you see logicalrep_write_stream_commit(), we have an assertion for xid but not for LSN and other parameters. I think the current coding in the patch is consistent with that. > > 08. guc_tables.c - ConfigureNamesInt > > ``` > _sync_workers_per_subscription, > + 2, 0, MAX_PARALLEL_WORKER_LIMIT, > + NULL, NULL, NULL > + }, > ``` > > The upper limit for max_sync_workers_per_subscription seems to be wrong, it > should > be used for max_parallel_apply_workers_per_subscription. > Right, I don't know why this needs to be changed in the first place. -- With Regards, Amit Kapila.
Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.
Hi, I am new to the postgres community and apologise for resending this as the previous one didn't include patch properly and didn't cc reviewers (maybe the reason it has been buried in mailing list for months) Adding to previous email, this patch exposes its own C API for creating ObjectLinkingLayer in a similar fashion as LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager since orc doesn't expose it yet. Thanks and really appreciate if someone can offer a review to this and help get it merged. Cheers, Alex On Mon, Aug 29, 2022 at 5:46 PM Alex Fan wrote: > This brings the bonus of support jitting on riscv64 (included in this > patch) > and other platforms Rtdyld doesn't support, e.g. windows COFF. > > Currently, llvm doesn't expose jitlink (ObjectLinkingLayer) via C API, so > a wrapper is added. This also adds minor llvm 15 compat fix that is needed > --- > config/llvm.m4| 1 + > src/backend/jit/llvm/llvmjit.c| 67 +-- > src/backend/jit/llvm/llvmjit_wrap.cpp | 35 ++ > src/include/jit/llvmjit.h | 9 > 4 files changed, 108 insertions(+), 4 deletions(-) > > diff --git a/config/llvm.m4 b/config/llvm.m4 > index 3a75cd8b4d..a31b8b304a 100644 > --- a/config/llvm.m4 > +++ b/config/llvm.m4 > @@ -75,6 +75,7 @@ AC_DEFUN([PGAC_LLVM_SUPPORT], >engine) pgac_components="$pgac_components $pgac_component";; >debuginfodwarf) pgac_components="$pgac_components $pgac_component";; >orcjit) pgac_components="$pgac_components $pgac_component";; > + jitlink) pgac_components="$pgac_components $pgac_component";; >passes) pgac_components="$pgac_components $pgac_component";; >native) pgac_components="$pgac_components $pgac_component";; >perfjitevents) pgac_components="$pgac_components $pgac_component";; > diff --git a/src/backend/jit/llvm/llvmjit.c > b/src/backend/jit/llvm/llvmjit.c > index 6c72d43beb..d8b840da8c 100644 > --- a/src/backend/jit/llvm/llvmjit.c > +++ b/src/backend/jit/llvm/llvmjit.c > @@ -229,6 +229,11 @@ llvm_release_context(JitContext *context) > LLVMModuleRef > llvm_mutable_module(LLVMJitContext *context) > { > +#ifdef __riscv > + const char* abiname; > + const char* target_abi = "target-abi"; > + LLVMMetadataRef abi_metadata; > +#endif > llvm_assert_in_fatal_section(); > > /* > @@ -241,6 +246,40 @@ llvm_mutable_module(LLVMJitContext *context) > context->module = LLVMModuleCreateWithName("pg"); > LLVMSetTarget(context->module, llvm_triple); > LLVMSetDataLayout(context->module, llvm_layout); > +#ifdef __riscv > +#if __riscv_xlen == 64 > +#ifdef __riscv_float_abi_double > + abiname = "lp64d"; > +#elif defined(__riscv_float_abi_single) > + abiname = "lp64f"; > +#else > + abiname = "lp64"; > +#endif > +#elif __riscv_xlen == 32 > +#ifdef __riscv_float_abi_double > + abiname = "ilp32d"; > +#elif defined(__riscv_float_abi_single) > + abiname = "ilp32f"; > +#else > + abiname = "ilp32"; > +#endif > +#else > + elog(ERROR, "unsupported riscv xlen %d", __riscv_xlen); > +#endif > + /* > +* set this manually to avoid llvm defaulting to soft > float and > +* resulting in linker error: `can't link double-float > modules > +* with soft-float modules` > +* we could set this for TargetMachine via MCOptions, but > there > +* is no C API for it > +* ref: > https://github.com/llvm/llvm-project/blob/afa520ab34803c82587ea6759bfd352579f741b4/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp#L90 > +*/ > + abi_metadata = LLVMMDStringInContext2( > + LLVMGetModuleContext(context->module), > + abiname, strlen(abiname)); > + LLVMAddModuleFlag(context->module, > LLVMModuleFlagBehaviorOverride, > + target_abi, strlen(target_abi), abi_metadata); > +#endif > } > > return context->module; > @@ -786,6 +825,8 @@ llvm_session_initialize(void) > char *error = NULL; > char *cpu = NULL; > char *features = NULL; > + LLVMRelocMode reloc=LLVMRelocDefault; > + LLVMCodeModel codemodel=LLVMCodeModelJITDefault; > LLVMTargetMachineRef opt0_tm; > LLVMTargetMachineRef opt3_tm; > > @@ -820,16 +861,21 @@ llvm_session_initialize(void) > elog(DEBUG2, "LLVMJIT detected CPU \"%s\", with features \"%s\"", > cpu, features); > > +#ifdef __riscv > + reloc=LLVMRelocPIC; > + codemodel=LLVMCodeModelMedium; > +#endif > + > opt0_tm = > LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, > features, > > LLVMCodeGenLevelNone, > - > LLVMRelocDefault, > - > LLVMCodeModelJITDefault); > +
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
On Wed, Nov 23, 2022 at 2:42 PM Andres Freund wrote: > The failure has to be happening in wait_for_postmaster_promote(), because the > standby2 is actually successfully promoted. I assume this is ext4. Presumably anything that reads the controlfile, like pg_ctl, pg_checksums, pg_resetwal, pg_control_system(), ... by reading without interlocking against writes could see garbage. I have lost track of the versions and the thread, but I worked out at some point by experimentation that this only started relatively recently for concurrent read() and write(), but always happened with concurrent pread() and pwrite(). The control file uses the non-p variants which didn't mash old/new data like grated cheese under concurrency due to some implementation detail, but now does.
Re: [PoC] Federated Authn/z with OAUTHBEARER
Hi, We validated on libpq handling OAuth natively with different flows with different OIDC certified providers. Flows: Device Code, Client Credentials and Refresh Token. Providers: Microsoft, Google and Okta. Also validated with OAuth provider Github. We propose using OpenID Connect (OIDC) as the protocol, instead of OAuth, as it is: - Discovery mechanism to bridge the differences and provide metadata. - Stricter protocol and certification process to reliably identify which providers can be supported. - OIDC is designed for authentication, while the main purpose of OAUTH is to authorize applications on behalf of the user. Github is not OIDC certified, so won’t be supported with this proposal. However, it may be supported in the future through the ability for the extension to provide custom discovery document content. OpenID configuration has a well-known discovery mechanism for the provider configuration URI which is defined in OpenID Connect. It allows libpq to fetch metadata about provider (i.e endpoints, supported grants, response types, etc). In the attached patch (based on V2 patch in the thread and does not contain Samay's changes): - Provider can configure issuer url and scope through the options hook.) - Server passes on an open discovery url and scope to libpq. - Libpq handles OAuth flow based on the flow_type sent in the connection string [1]. - Added callbacks to notify a structure to client tools if OAuth flow requires user interaction. - Pg backend uses hooks to validate bearer token. Note that authentication code flow with PKCE for GUI clients is not implemented yet. Proposed next steps: - Broaden discussion to reach agreement on the approach. - Implement libpq changes without iddawc - Prototype GUI flow with pgAdmin Thanks, Mahendrakar. [1]: connection string for refresh token flow: ./psql -U -d 'dbname=postgres oauth_client_id= oauth_flow_type= oauth_refresh_token=' On Mon, 3 Oct 2022 at 23:34, Andrey Chudnovsky wrote: > > > I think we can probably prototype a callback hook for approach (1) > > pretty quickly. (2) is a lot more work and investigation, but it's > > work that I'm interested in doing (when I get the time). I think there > > are other very good reasons to consider a third-party SASL library, > > and some good lessons to be learned, even if the community decides not > > to go down that road. > > Makes sense. We will work on (1.) and do some check if there are any > blockers for a shared solution to support github and google. > > On Fri, Sep 30, 2022 at 1:45 PM Jacob Champion > wrote: > > > > On Fri, Sep 30, 2022 at 7:47 AM Andrey Chudnovsky > > wrote: > > > > How should we communicate those pieces to a custom client when it's > > > > passing a token directly? The easiest way I can see is for the custom > > > > client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin). > > > > If you had to parse the libpq error message, I don't think that'd be > > > > particularly maintainable. > > > > > > I agree that parsing the message is not a sustainable way. > > > Could you provide more details on the SASL plugin approach you propose? > > > > > > Specifically, is this basically a set of extension hooks for the client > > > side? > > > With the need for the client to be compiled with the plugins based on > > > the set of providers it needs. > > > > That's a good question. I can see two broad approaches, with maybe > > some ability to combine them into a hybrid: > > > > 1. If there turns out to be serious interest in having libpq itself > > handle OAuth natively (with all of the web-facing code that implies, > > and all of the questions still left to answer), then we might be able > > to provide a "token hook" in the same way that we currently provide a > > passphrase hook for OpenSSL keys. By default, libpq would use its > > internal machinery to take the provider details, navigate its builtin > > flow, and return the Bearer token. If you wanted to override that > > behavior as a client, you could replace the builtin flow with your > > own, by registering a set of callbacks. > > > > 2. Alternatively, OAuth support could be provided via a mechanism > > plugin for some third-party SASL library (GNU libgsasl, Cyrus > > libsasl2). We could provide an OAuth plugin in contrib that handles > > the default flow. Other providers could publish their alternative > > plugins to completely replace the OAUTHBEARER mechanism handling. > > > > Approach (2) would make for some duplicated effort since every > > provider has to write code to speak the OAUTHBEARER protocol. It might > > simplify provider-specific distribution, since (at least for Cyrus) I > > think you could build a single plugin that supports both the client > > and server side. But it would be a lot easier to unknowingly (or > > knowingly) break the spec, since you'd control both the client and > > server sides. There would be less incentive to interoperate. > > > > Finally, we could potentially take pieces
Re: Logical Replication Custom Column Expression
Just one correction for the subscriber On Subscriber: test_sub=# CREATE TABLE tab(id int *pkey*, description varchar, subscription varchar *pkey*); CREATE TABLE The subscription table should have the same primary key columns as the publisher plus one more. We need to make sure that on update only the same origin data is being updated. Στις Τετ 23 Νοε 2022 στις 1:24 π.μ., ο/η Peter Smith έγραψε: > On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas > wrote: > > > > Reading more carefully what you described, I think you are interested in > getting something you call origin from publishers, probably some metadata > from the publications. > > > > This identifier in those metadata maybe does not have business value on > the reporting side. The idea is to use a value which has specific meaning > to the user at the end. > > > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the > end based on a dimension table which holds this mapping the user would be > able to filter the data. So programmatically the user can set the id value > of the column plus creating the mapping table from an application let’s say > and be able to distinguish the data. > > > > In addition this column should have the ability to be part of the > primary key on the subscription table in order to not conflict with lines > from other tenants having the same keys. > > > > > > I was wondering if a simpler syntax solution might also work here. > > Imagine another SUBSCRIPTION parameter that indicates to write the > *name* of the subscription to some pre-defined table column: > e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1 > CONNECTION '...' WITH (subscription_column); > > Logical Replication already allows the subscriber table to have extra > columns, so you just need to manually create the extra 'subscription' > column up-front. > > Then... > > ~~ > > On Publisher: > > test_pub=# CREATE TABLE tab(id int primary key, description varchar); > CREATE TABLE > > test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three'); > INSERT 0 3 > > test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES; > CREATE PUBLICATION > > ~~ > > On Subscriber: > > test_sub=# CREATE TABLE tab(id int, description varchar, subscription > varchar); > CREATE TABLE > > test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost > dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column); > CREATE SUBSCRIPTION > > test_sub=# SELECT * FROM tab; > id | description | subscription > +-+-- > 1 | one | sub_tenant1 > 2 | two | sub_tenant1 > 3 | three | sub_tenant1 > (3 rows) > > ~~ > > Subscriptions to different tenants would be named differently. > > And using other SQL you can map/filter those names however your > application wants. > > -- > Kind Regards, > Peter Smith. > Fujitsu Australia >
Re: Logical Replication Custom Column Expression
It's easy to answer this question. Imagine that in a software company who sells the product and also offers reporting solutions, the ERP tables will not have this additional column to all the tables. Now the reporting department comes and needs to consolidate all that data from different databases (publishers) and create one multitenant database to have all the data. So in an ERP like NAV or anything else you cannot suggest change all the code to all of the tables plus all functions to add one additional column to this table, even that was possible then you cannot work with integers but you need to work with GUIDs as this column should be predefined to each ERP. Then joining with GUID in the second phase for reporting definitely will slow down the performance. In summary: 1. Cannot touch the underlying source (important) 2. GUID identifier column will slow down the reporting performance Στις Τετ 23 Νοε 2022 στις 5:19 π.μ., ο/η Amit Kapila < amit.kapil...@gmail.com> έγραψε: > On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas > wrote: > > > > Reading more carefully what you described, I think you are interested in > getting something you call origin from publishers, probably some metadata > from the publications. > > > > This identifier in those metadata maybe does not have business value on > the reporting side. The idea is to use a value which has specific meaning > to the user at the end. > > > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the > end based on a dimension table which holds this mapping the user would be > able to filter the data. So programmatically the user can set the id value > of the column plus creating the mapping table from an application let’s say > and be able to distinguish the data. > > > > In your example, are different tenants represent different publisher > nodes? If so, why can't we have a predefined column and value for the > required tables on each publisher rather than logical replication > generate that value while replicating data? > > -- > With Regards, > Amit Kapila. >
Re: Non-decimal integer literals
On Wed, Nov 23, 2022 at 3:54 PM David Rowley wrote: > > Going by [1], clang will actually use multiplication by 16 to > implement the former. gcc is better and shifts left by 4, so likely > won't improve things for gcc. It seems worth doing it this way for > anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway. FWIW, gcc 12.2 generates an imul on my system when compiling in situ. I've found it useful to run godbolt locally* and load the entire PG file (nicer to read than plain objdump) -- compilers can make different decisions when going from isolated snippets to within full functions. * clone from https://github.com/compiler-explorer/compiler-explorer install npm 16 run "make" and when finished will show the localhost url add the right flags, which in this case was -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -I/path/to/srcdir/src/include -I/path/to/builddir/src/include -D_GNU_SOURCE -- John Naylor EDB: http://www.enterprisedb.com
Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
Hello Dimos On 2022-Nov-22, Dimos Stamatakis wrote: > When running tpcc on sysbench with high concurrency (96 threads, scale > factor 5) we realized that a fix for visibility check (introduced in > PG-14.5) causes sysbench to fail in 1 out of 70 runs. > The error is the following: > > SQL error, errno = 0, state = 'XX000': new multixact has more than one > updating member Ouch. I did not remember any reports of this. Searching I found this recent one: https://postgr.es/m/17518-04e368df5ad7f...@postgresql.org However, the reporter there says they're using 12.10, and according to src/tools/git_changelog the commit appeared only in 12.12: Author: Heikki Linnakangas Branch: master Release: REL_15_BR [adf6d5dfb] 2022-06-27 08:21:08 +0300 Branch: REL_14_STABLE Release: REL_14_5 [e24615a00] 2022-06-27 08:24:30 +0300 Branch: REL_13_STABLE Release: REL_13_8 [7ba325fd7] 2022-06-27 08:24:35 +0300 Branch: REL_12_STABLE Release: REL_12_12 [af530898e] 2022-06-27 08:24:36 +0300 Branch: REL_11_STABLE Release: REL_11_17 [b49889f3c] 2022-06-27 08:24:37 +0300 Branch: REL_10_STABLE Release: REL_10_22 [4822b4627] 2022-06-27 08:24:38 +0300 Fix visibility check when XID is committed in CLOG but not in procarray. [...] Thinking further, one problem in tracking this down is that at this point the multixact in question is *being created*, so we don't have a WAL trail we could trace through. I suggest that we could improve that elog() so that it includes the members of the multixact in question, which could help us better understand what is going on. > This commit was supposed to fix a race condition during the visibility > check. Please let us know whether you are aware of this issue and if > there is a quick fix. I don't think so. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Logical replication missing information
Hello!Thank you very much, the information you just sent to me is very, very valuable! Best regards, Cristi Boboc On Wednesday, November 23, 2022 at 04:45:09 AM GMT+2, Peter Smith wrote: On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form wrote: > > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html > Description: Hi, FYI - I have forwarded this post to the hacker's list, where I think it will receive more attention. I am not sure why that (above) page was cited -- the section "31.3 Row Filters" is specifically about row filtering, whereas the items you reported seem unrelated to row filters, but are generic for all Logical Replication. > > There are several things missing here and some of them I found to be highly > important: > 1. How can I find why a logical replication failed. Currently I only can see > it "does nothing" in pg_stat_subscriptions. There should be logs reporting any replication conflicts etc. See [1] for example logs. See also the answer for #2 below. > 2. In case of copying the existing data: how can I find which tables or > partitions were processed and which are on the processing queue (while > monitoring I have observed no specific order or rule). There is no predictable processing queue or order - The initial tablesyncs might be happening in multiple asynchronous processes according to the GUC max_sync_workers_per_subscription [2]. Below I show examples of replicating two tables (tab1 and tab2). ~~ >From the logs you should see which table syncs have completed OK: e.g. (the initial copy is all good) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:23:18.513 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:23:18.524 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:23:18.593 AEDT [27963] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished 2022-11-23 12:23:18.611 AEDT [27965] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has finished e.g. (where there is conflict in table tab2) test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG: logical replication apply worker for subscription "sub1" has started 2022-11-23 12:40:56.808 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has started 2022-11-23 12:40:56.819 AEDT [23405] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:40:56.890 AEDT [23405] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:40:56.890 AEDT [23405] DETAIL: Key (id)=(1) already exists. 2022-11-23 12:40:56.890 AEDT [23405] CONTEXT: COPY tab2, line 1 2022-11-23 12:40:56.891 AEDT [3233] LOG: background worker "logical replication worker" (PID 23405) exited with exit code 1 2022-11-23 12:40:56.902 AEDT [23403] LOG: logical replication table synchronization worker for subscription "sub1", table "tab1" has finished ... ~~ Alternatively, you can use some SQL query to discover which tables of the subscription had attained a READY state. The READY state (denoted by 'r') means that the initial COPY was completed ok. The table replication state is found in the 'srsubstate' column. See [3] e.g. (the initial copy is all good) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid; srsubid | srrelid | subname | relname | srsubstate -+-+-+-+ 16418 | 16409 | sub1 | tab1 | r 16418 | 16402 | sub1 | tab2 | r (2 rows) e.g. (where it has a conflict in table tab2, so it did not get to READY state) test_sub=# select sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23 12:41:37.686 AEDT [24501] LOG: logical replication table synchronization worker for subscription "sub1", table "tab2" has started 2022-11-23 12:41:37.774 AEDT [24501] ERROR: duplicate key value violates unique constraint "tab2_pkey" 2022-11-23 12:41:37.774 AEDT [24501] DETAIL: Key (id)=(1) already exists. 2022-11-23
Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
On 2022-Nov-22, Andres Freund wrote: > ok 10 - standby is in recovery > # Running: pg_ctl -D > /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/src/bin/pg_ctl/tmp_check/t_003_promote_standby2_data/pgdata > promote > waiting for server to promotepg_ctl: control file appears to be corrupt > not ok 11 - pg_ctl promote of standby runs > > # Failed test 'pg_ctl promote of standby runs' > # at > /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/TestLib.pm > line 474. This triggered me on this proposal I saw yesterday https://postgr.es/m/02fe0063-bf77-90d0-3cf5-e9fe7c2a4...@postgrespro.ru I think trying to store more stuff in pg_control is dangerous and we should resist it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Operation log for major operations
On 2022-Nov-21, Dmitry Koval wrote: > Concepts. > - > * operation log is placed in the file 'global/pg_control', starting from > position 4097 (log size is 4kB); I think storing this in pg_control is a bad idea. That file is extremely critical and if you break it, you're pretty much SOL on recovering your data. I suggest that this should use a separate file. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "All rings of power are equal, But some rings of power are more equal than others." (George Orwell's The Lord of the Rings)
Re: Non-decimal integer literals
On Wed, 23 Nov 2022 at 21:54, David Rowley wrote: > I wonder if you'd be better off with something like: > > while (*ptr && isxdigit((unsigned char) *ptr)) > { > if (unlikely(tmp & UINT64CONST(0xF000))) > goto out_of_range; > > tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++]; > } Here's a delta diff with it changed to work that way. David diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 2942b7c449..ce305b611d 100644 --- a/src/backend/utils/adt/numutils.c +++ b/src/backend/utils/adt/numutils.c @@ -136,13 +136,10 @@ pg_strtoint16(const char *s) ptr += 2; while (*ptr && isxdigit((unsigned char) *ptr)) { - int8digit = hexlookup[(unsigned char) *ptr]; - - if (unlikely(pg_mul_s16_overflow(tmp, 16, )) || - unlikely(pg_sub_s16_overflow(tmp, digit, ))) + if (unlikely(tmp & 0xF000)) goto out_of_range; - ptr++; + tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++]; } } else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O')) @@ -151,11 +148,10 @@ pg_strtoint16(const char *s) while (*ptr && (*ptr >= '0' && *ptr <= '7')) { - int8digit = (*ptr++ - '0'); - - if (unlikely(pg_mul_s16_overflow(tmp, 8, )) || - unlikely(pg_sub_s16_overflow(tmp, digit, ))) + if (unlikely(tmp & 0xE000)) goto out_of_range; + + tmp = (tmp << 3) | (*ptr++ - '0'); } } else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B')) @@ -164,11 +160,10 @@ pg_strtoint16(const char *s) while (*ptr && (*ptr >= '0' && *ptr <= '1')) { - int8digit = (*ptr++ - '0'); - - if (unlikely(pg_mul_s16_overflow(tmp, 2, )) || - unlikely(pg_sub_s16_overflow(tmp, digit, ))) + if (unlikely(tmp & 0x8000)) goto out_of_range; + + tmp = (tmp << 1) | (*ptr++ - '0'); } } else @@ -255,13 +250,10 @@ pg_strtoint32(const char *s) ptr += 2; while (*ptr && isxdigit((unsigned char) *ptr)) { - int8digit = hexlookup[(unsigned char) *ptr]; - - if (unlikely(pg_mul_s32_overflow(tmp, 16, )) || - unlikely(pg_sub_s32_overflow(tmp, digit, ))) + if (unlikely(tmp & 0xF000)) goto out_of_range; - ptr++; + tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++]; } } else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O')) @@ -270,11 +262,10 @@ pg_strtoint32(const char *s) while (*ptr && (*ptr >= '0' && *ptr <= '7')) { - int8digit = (*ptr++ - '0'); - - if (unlikely(pg_mul_s32_overflow(tmp, 8, )) || - unlikely(pg_sub_s32_overflow(tmp, digit, ))) + if (unlikely(tmp & 0xE000)) goto out_of_range; + + tmp = (tmp << 3) | (*ptr++ - '0'); } } else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B')) @@ -283,11 +274,10 @@ pg_strtoint32(const char *s) while (*ptr && (*ptr >= '0' && *ptr <= '1')) { - int8digit = (*ptr++ - '0'); - - if (unlikely(pg_mul_s32_overflow(tmp, 2, )) || - unlikely(pg_sub_s32_overflow(tmp, digit, ))) + if (unlikely(tmp & 0x8000)) goto out_of_range; + + tmp = (tmp << 1) | (*ptr++ - '0'); } } else @@ -382,13 +372,10 @@ pg_strtoint64(const char *s) ptr += 2; while (*ptr && isxdigit((unsigned char) *ptr)) { - int8digit = hexlookup[(unsigned char) *ptr]; - - if (unlikely(pg_mul_s64_overflow(tmp, 16, )) || - unlikely(pg_sub_s64_overflow(tmp, digit, ))) + if (unlikely(tmp & UINT64CONST(0xF000))) goto out_of_range; - ptr++; + tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++]; }
Re: Non-decimal integer literals
On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut wrote: > Here is a new patch. This looks like quite an inefficient way to convert a hex string into an int64: while (*ptr && isxdigit((unsigned char) *ptr)) { int8digit = hexlookup[(unsigned char) *ptr]; if (unlikely(pg_mul_s64_overflow(tmp, 16, )) || unlikely(pg_sub_s64_overflow(tmp, digit, ))) goto out_of_range; ptr++; } I wonder if you'd be better off with something like: while (*ptr && isxdigit((unsigned char) *ptr)) { if (unlikely(tmp & UINT64CONST(0xF000))) goto out_of_range; tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++]; } Going by [1], clang will actually use multiplication by 16 to implement the former. gcc is better and shifts left by 4, so likely won't improve things for gcc. It seems worth doing it this way for anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway. David [1] https://godbolt.org/z/jz6Th6jnM
Re: [DOCS] Stats views and functions not in order?
On Thu, Nov 17, 2022 at 8:46 AM David G. Johnston wrote: > > On Tue, Nov 15, 2022 at 6:39 PM Peter Smith wrote: >> >> >> I was also wondering (but have not yet done) if the content *outside* >> the tables should be reordered to match the table 28.1/28.2 order. >> >> Thoughts? >> Thanks for the feedback/suggestions > > I would love to do away with the ToC listing of view names in 28.2 altogether. > OK, done. See patch 0006. To prevent all the views sections from participating in the ToC I simply changed them to instead of . I’m not 100% sure if this was a brilliant modification or a total hack, but it does do exactly what you wanted. > Also, make it so each view ends up being its own separate page. > I did not do this. AFAIK those views of chapter 54 get rendered to separate pages only because they are top-level . So I do not know how to put all these stats views onto different pages without radically changing the document structure. Anyway – doing this would be incompatible with my changes of patch 0006 (see above). > The name of the views in the table should then be the hyperlinks to those > pages. > OK done. See patch 0005. All the view names (in column one of the tables) are hyperlinked to the views the same way as Chapter 54 does. The tables are a lot cleaner now. A couple of inconsistent view ids were also corrected. > Basically the way Chapter 54.1 works. Though the interplay between the top > Chapter 54 and 54.1 is a bit repetitive. > > https://www.postgresql.org/docs/current/views.html > > I wonder whether having the table be structured but the ToC be purely > alphabetical would be considered a good idea... > > The tables need hyperlinks regardless. I wouldn't insist on changing the > ordering to match the table, especially with the hyperlinks, but I also > wouldn't reject it. Figuring out how to make them one-per-page would be time > better spent though. > PSA new patches. Now there are 6 of them. If some of the earlier patches are agreeable can those ones please be committed? (because I think this patch may be susceptible to needing a big rebase if anything in those tables changes). -- Kind Regards, Peter Smith. Fujitsu Australia. v6-0002-Re-order-Table-28.2-Collected-Statistics-Views.patch Description: Binary data v6-0003-Re-order-Table-28.12-Wait-Events-of-type-LWLock.patch Description: Binary data v6-0005-Cleanup-view-name-hyperlinks-for-Tables-28.1-and-.patch Description: Binary data v6-0004-Re-order-Table-28.35-Per-Backend-Statistics-Funct.patch Description: Binary data v6-0006-Remove-all-stats-views-from-the-ToC-of-28.2.patch Description: Binary data v6-0001-Re-order-sections-of-28.4.-Progress-Reporting.patch Description: Binary data
Re: Prefetch the next tuple's memory during seqscans
On Tue, Nov 22, 2022 at 11:44 PM David Rowley wrote: > On Wed, 23 Nov 2022 at 20:29, sirisha chamarthi > wrote: > > I ran your test1 exactly like your setup except the row count is 300 > (with 13275 blocks). Shared_buffers is 128MB and the hardware configuration > details at the bottom of the mail. It appears Master + 0001 + 0005 > regressed compared to master slightly . > > Thank you for running these tests. > > Can you share if the plans used for these queries was a parallel plan? > I had set max_parallel_workers_per_gather to 0 to remove the > additional variability from parallel query. > > Also, 13275 blocks is 104MBs, does EXPLAIN (ANALYZE, BUFFERS) indicate > that all pages were in shared buffers? I used pg_prewarm() to ensure > they were so that the runs were consistent. > I reran the test with setting max_parallel_workers_per_gather = 0 and with pg_prewarm. Appears I missed some step while testing on the master, thanks for sharing the details. New numbers show master has higher latency than *Master + 0001 + 0005*. *Master* Before vacuum: latency average = 452.881 ms After vacuum: latency average = 393.880 ms *Master + 0001 + 0005* Before vacuum: latency average = 441.832 ms After vacuum: latency average = 369.591 ms
Re: New docs chapter on Transaction Management and related changes
On Tue, Nov 22, 2022 at 01:50:36PM -0500, Bruce Momjian wrote: > + > + > + A more complex example with multiple nested subtransactions: > + > +BEGIN; > +INSERT INTO table1 VALUES (1); > +SAVEPOINT sp1; > +INSERT INTO table1 VALUES (2); > +SAVEPOINT sp2; > +INSERT INTO table1 VALUES (3); > +RELEASE SAVEPOINT sp2; > +INSERT INTO table1 VALUES (4))); -- generates an error > + > + In this example, the application requests the release of the savepoint > + sp2, which inserted 3. This changes the insert's > + transaction context to sp1. When the statement > + attempting to insert value 4 generates an error, the insertion of 2 and > + 4 are lost because they are in the same, now-rolled back savepoint, > + and value 3 is in the same transaction context. The application can > + now only choose one of these two commands, since all other commands > + will be ignored with a warning: > + > + ROLLBACK; > + ROLLBACK TO SAVEPOINT sp1; > + > + Choosing ROLLBACK will abort everything, including > + value 1, whereas ROLLBACK TO SAVEPOINT sp1 will retain > + value 1 and allow the transaction to continue. > + This mentions a warning, but what happens is actually an error: postgres=!# select; ERROR: current transaction is aborted, commands ignored until end of transaction block
Re: More efficient build farm animal wakeup?
On Wed, Nov 23, 2022 at 2:09 PM Andres Freund wrote: > It's a huge improvement here. Same here. eelpout + elver looking good, just a fraction of a second hitting that web server each minute. Long polling will be better and shave off 30 seconds (+/- 30) on start time, but this avoids a lot of useless churn without even needing a local mirror. Thanks Andrew!