Decouple last important WAL record LSN from WAL insert locks

2022-11-23 Thread Bharath Rupireddy
Hi,

While working on something else, I noticed that each WAL insert lock
tracks its own last important WAL record's LSN (lastImportantAt) and
both the bgwriter and checkpointer later computes the max
value/server-wide last important WAL record's LSN via
GetLastImportantRecPtr(). While doing so, each WAL insertion lock is
acquired in exclusive mode in a for loop. This seems like too much
overhead to me. I quickly coded a patch (attached herewith) that
tracks the server-wide last important WAL record's LSN in
XLogCtlInsert (lastImportantPos) protected with a spinlock and gets
rid of lastImportantAt from each WAL insert lock. I ran pgbench with a
simple insert [1] and the results are below. While the test was run,
the GetLastImportantRecPtr() was called 4-5 times.

# of clientsHEADPATCHED
18382
2159157
4303302
8576570
1611041095
3220552041
6422862295
12822702285
25623022253
51222052290
76822242180
102421092150
204819411936
409618561848

It doesn't seem to hurt (for this use-case) anyone, however there
might be some benefit if bgwriter and checkpointer come in the way of
WAL inserters. With the patch, the extra exclusive lock burden on WAL
insert locks is gone. Since the amount of work the WAL inserters do
under the new spinlock is very minimal (updating
XLogCtlInsert->lastImportantPos), it may not be an issue. Also, it's
worthwhile to look at the existing comment [2], which doesn't talk
about the performance impact of having a lock.

Thoughts?

[1]
./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j
8 install > install.log 2>&1 &
cd inst/bin
./pg_ctl -D data -l logfile stop
rm -rf data logfile insert.sql
free -m
sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
free -m
./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "32GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";'
./psql -d postgres -c 'ALTER SYSTEM SET bgwriter_delay = "10ms";'
./pg_ctl -D data -l logfile restart
./pgbench -i -s 1 -d postgres
./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;"
cat << EOF >> insert.sql
\set aid random(1, 10 * :scale)
\set delta random(1, 10 * :scale)
INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta);
EOF
for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n
"$c ";./pgbench -n -M prepared -U ubuntu postgres -b simple-update
-c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

[2]
 * records.  Tracking the WAL activity directly in WALInsertLock has the
 * advantage of not needing any additional locks to update the value.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Decouple-last-important-WAL-record-LSN-from-WAL-i.patch
Description: Binary data


Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-23 Thread Amit Kapila
On Tue, Nov 22, 2022 at 7:30 AM houzj.f...@fujitsu.com
 wrote:
>

Few minor comments and questions:

1.
+static void
+LogicalParallelApplyLoop(shm_mq_handle *mqh)
{
+ for (;;)
+ {
+ void*data;
+ Size len;
+
+ ProcessParallelApplyInterrupts();
...
...
+ if (rc & WL_LATCH_SET)
+ {
+ ResetLatch(MyLatch);
+ ProcessParallelApplyInterrupts();
+ }
...
}

Why ProcessParallelApplyInterrupts() is called twice in
LogicalParallelApplyLoop()?

2.
+ * This scenario is similar to the first case but TX-1 and TX-2 are executed by
+ * two parallel apply workers (PA-1 and PA-2 respectively). In this scenario,
+ * PA-2 is waiting for PA-1 to complete its transaction while PA-1 is waiting
+ * for subsequent input from LA. Also, LA is waiting for PA-2 to complete its
+ * transaction in order to preserve the commit order. There is a deadlock among
+ * three processes.
+ *
...
...
+ *
+ * LA (waiting to acquire the local transaction lock) -> PA-1 (waiting to
+ * acquire the lock on the unique index) -> PA-2 (waiting to acquire the lock
+ * on the remote transaction) -> LA
+ *

Isn't the order of PA-1 and PA-2 different in the second paragraph as
compared to the first one.

3.
+ * Deadlock-detection
+ * --

It may be better to keep the title of this section as Locking Considerations.

4. In the section mentioned in Point 3, it would be better to
separately explain why we need session-level locks instead of
transaction level.

5. Add the below comments in the code:
diff --git a/src/backend/replication/logical/applyparallelworker.c
b/src/backend/replication/logical/applyparallelworker.c
index 9385afb6d2..56f00defcf 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -431,6 +431,9 @@ pa_free_worker_info(ParallelApplyWorkerInfo *winfo)
if (winfo->dsm_seg != NULL)
dsm_detach(winfo->dsm_seg);

+   /*
+* Ensure this worker information won't be reused during
worker allocation.
+*/
ParallelApplyWorkersList = list_delete_ptr(ParallelApplyWorkersList,

winfo);

@@ -762,6 +765,10 @@
HandleParallelApplyMessage(ParallelApplyWorkerInfo *winfo, StringInfo
msg)
 */
error_context_stack = apply_error_context_stack;

+   /*
+* The actual error must be already
reported by parallel apply
+* worker.
+*/
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("parallel
apply worker exited abnormally"),




-- 
With Regards,
Amit Kapila.




Re: Hash index build performance tweak from sorting

2022-11-23 Thread Simon Riggs
On Wed, 23 Nov 2022 at 13:04, David Rowley  wrote:

> After getting rid of the  HashInsertState code and just adding bool
> sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch
> is much more simple:

Seems good to me and I wouldn't argue with any of your comments.

> and v4 includes 7 extra lines in hashinsert.c for the Assert() I
> mentioned in my previous email plus a bunch of extra comments.

Oh, I did already include that in v3 as requested.

> I'd rather see this solved like v4 is doing it.

Please do. No further comments. Thanks for your help

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Prefetch the next tuple's memory during seqscans

2022-11-23 Thread David Rowley
On Wed, 23 Nov 2022 at 21:26, sirisha chamarthi
 wrote:
> Master
> After vacuum:
> latency average = 393.880 ms
>
> Master + 0001 + 0005
> After vacuum:
> latency average = 369.591 ms

Thank you for running those again.  Those results make more sense.
Would you mind also testing the count(*) query too?

David




Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Fri, 18 Nov 2022 at 03:34, Tomas Vondra
 wrote:
> I did some simple benchmark with v2 and v3, using the attached script,
> which essentially just builds hash index on random data, with different
> data types and maintenance_work_mem values. And what I see is this
> (median of 10 runs):

> So to me it seems v2 performs demonstrably better, v3 is consistently
> slower - not only compared to v2, but often also to master.

Could this just be down to code alignment changes?  There does not
really seem to be any fundamental differences which would explain
this.

David




Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Wed, 16 Nov 2022 at 17:33, Simon Riggs  wrote:
>
> Thanks for the review, apologies for the delay in acting upon your comments.
>
> My tests show the sorted and random tests are BOTH 4.6% faster with
> the v3 changes using 5-test avg, but you'll be pleased to know your
> kit is about 15.5% faster than mine, comparing absolute execution
> times.

Thanks for the updated patch.

I started to look at this again and I'm starting to think that the
HashInsertState struct is the wrong approach for passing along the
sorted flag to _hash_doinsert().  The reason I think this is that in
hashbuild() when setting buildstate.spool to NULL, you're also making
the decision about what to set the sorted flag to.  However, in
reality, we already know what we should be passing *every* time we
call _hash_doinsert().  The only place where we can pass the sorted
option as true is in _h_indexbuild() when we're doing the sorted
version of the index build. Trying to make that decision any sooner
seems error-prone.

I understand you have made HashInsertState so that we don't need to
add new parameters should we ever need to pass something else along,
but I'm just thinking that if we ever need to add more, then we should
just reconsider this in the future.  I think for today, the better
option is just to add the bool sorted as a parameter to
_hash_doinsert() and pass it as true in the single case where it's
valid to do so.  That seems less likely that we'll inherit some
options from some other place after some future modification and end
up passing sorted as true when it should be false.

Another reason I didn't like the HashInsertState idea is that in the
v3 patch there's an HashInsertState in both HashBuildState and HSpool.
Because in the normal insert path (hashinsert), we've neither a
HashBuildState nor an HSpool, you're having to fake up a
HashInsertStateData to pass something along to _hash_doinsert() in
hashinsert(). When we're building an index, in the non-sorted index
build case, you're always passing the HashInsertStateData from the
HashBuildState, but when we're doing the sorted index build the one
from HSpool is passed.  In other words, in each of the 3 calls to
_hash_doinsert(), the HashInsertStateData comes from a different
place.

Now, I do see that you've coded hashbuild() so both versions of the
HashInsertState point to the same HashInsertStateData, but I find it
unacceptable programming that in _h_spoolinit() the code palloc's the
memory for the HSpool and you're setting the istate field to the
HashInsertStateData that's on the stack. That just seems like a great
way to end up having istate pointing to junk should the HSpool ever
live beyond the hashbuild() call. If we really don't want HSpool to
live beyond hashbuild(), then it too should be a local variable to
hashbuild() instead of being palloc'ed in _h_spoolinit().
_h_spoolinit() could just be passed a pointer to the HSpool to
populate.

After getting rid of the  HashInsertState code and just adding bool
sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch
is much more simple:

v3:
 src/backend/access/hash/hash.c   | 19 ---
 src/backend/access/hash/hashinsert.c | 40
++--
 src/backend/access/hash/hashsort.c   |  8 ++--
 src/include/access/hash.h| 14 +++---
 4 files changed, 67 insertions(+), 14 deletions(-)

v4:
src/backend/access/hash/hash.c   |  4 ++--
src/backend/access/hash/hashinsert.c | 40 
src/backend/access/hash/hashsort.c   |  3 ++-
src/include/access/hash.h|  6 --
4 files changed, 40 insertions(+), 13 deletions(-)

and v4 includes 7 extra lines in hashinsert.c for the Assert() I
mentioned in my previous email plus a bunch of extra comments.

I'd rather see this solved like v4 is doing it.

David
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index c361509d68..77fd147f68 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -231,7 +231,7 @@ hashbuildCallback(Relation index,
itup = index_form_tuple(RelationGetDescr(index),
index_values, 
index_isnull);
itup->t_tid = *tid;
-   _hash_doinsert(index, itup, buildstate->heapRel);
+   _hash_doinsert(index, itup, buildstate->heapRel, false);
pfree(itup);
}
 
@@ -265,7 +265,7 @@ hashinsert(Relation rel, Datum *values, bool *isnull,
itup = index_form_tuple(RelationGetDescr(rel), index_values, 
index_isnull);
itup->t_tid = *ht_ctid;
 
-   _hash_doinsert(rel, itup, heapRel);
+   _hash_doinsert(rel, itup, heapRel, false);
 
pfree(itup);
 
diff --git a/src/backend/access/hash/hashinsert.c 
b/src/backend/access/hash/hashinsert.c
index 23907d2e5b..6718ff18f3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ 

Bug in MERGE's test for tables with rules

2022-11-23 Thread Dean Rasheed
While playing around with rules and MERGE, I noticed that there is a
bug in the way that it detects whether the target table has rules ---
it uses rd_rel->relhasrules, which can be incorrect, since it might be
set for a table that doesn't currently have rules, but did in the
recent past.

So it actually needs to examine rd_rules. Technically, I think that it
would be sufficient to just test whether rd_rules is non-NULL, but I
think it's more robust and readable to check rd_rules->numLocks, as in
the attached patch.

Regards,
Dean
diff --git a/src/backend/parser/parse_merge.c b/src/backend/parser/parse_merge.c
new file mode 100644
index 7913523..62c2ff6
--- a/src/backend/parser/parse_merge.c
+++ b/src/backend/parser/parse_merge.c
@@ -182,7 +182,8 @@ transformMergeStmt(ParseState *pstate, M
  errmsg("cannot execute MERGE on relation \"%s\"",
 		RelationGetRelationName(pstate->p_target_relation)),
  errdetail_relkind_not_supported(pstate->p_target_relation->rd_rel->relkind)));
-	if (pstate->p_target_relation->rd_rel->relhasrules)
+	if (pstate->p_target_relation->rd_rules != NULL &&
+		pstate->p_target_relation->rd_rules->numLocks > 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot execute MERGE on relation \"%s\"",
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
new file mode 100644
index 624d0e5..ad1a8c9
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -3532,6 +3532,30 @@ MERGE INTO rule_merge2 t USING (SELECT 1
 		DELETE
 	WHEN NOT MATCHED THEN
 		INSERT VALUES (s.a, '');
+SELECT * FROM rule_merge2;
+ a | b 
+---+---
+ 1 | 
+(1 row)
+
+-- and should be ok after dropping rules
+DROP RULE rule1 ON rule_merge1;
+DROP RULE rule2 ON rule_merge1;
+DROP RULE rule3 ON rule_merge1;
+MERGE INTO rule_merge1 t USING (SELECT 1 AS a) s
+	ON t.a = s.a
+	WHEN MATCHED AND t.a < 2 THEN
+		UPDATE SET b = b || ' updated by merge, had rules'
+	WHEN MATCHED AND t.a > 2 THEN
+		DELETE
+	WHEN NOT MATCHED THEN
+		INSERT VALUES (s.a, 'had rules');
+SELECT * FROM rule_merge1;
+ a | b 
+---+---
+ 1 | had rules
+(1 row)
+
 --
 -- Test enabling/disabling
 --
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
new file mode 100644
index bfb5f3b..8a728be
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -1264,6 +1264,21 @@ MERGE INTO rule_merge2 t USING (SELECT 1
 		DELETE
 	WHEN NOT MATCHED THEN
 		INSERT VALUES (s.a, '');
+SELECT * FROM rule_merge2;
+
+-- and should be ok after dropping rules
+DROP RULE rule1 ON rule_merge1;
+DROP RULE rule2 ON rule_merge1;
+DROP RULE rule3 ON rule_merge1;
+MERGE INTO rule_merge1 t USING (SELECT 1 AS a) s
+	ON t.a = s.a
+	WHEN MATCHED AND t.a < 2 THEN
+		UPDATE SET b = b || ' updated by merge, had rules'
+	WHEN MATCHED AND t.a > 2 THEN
+		DELETE
+	WHEN NOT MATCHED THEN
+		INSERT VALUES (s.a, 'had rules');
+SELECT * FROM rule_merge1;
 
 --
 -- Test enabling/disabling


Another multi-row VALUES bug

2022-11-23 Thread Dean Rasheed
I was thinking some more about the recent fix to multi-row VALUES
handling in the rewriter (b8f2687fdc), and I realised that there is
another bug in the way DEFAULT values are handled:

In RewriteQuery(), the code assumes that in a multi-row INSERT query,
the VALUES RTE will be the only thing in the query's fromlist. That's
true for the original query, but it's not necessarily the case for
product queries, if the rule action performs a multi-row insert,
leading to a new VALUES RTE that the DEFAULT-handling code might fail
to process. For example:

CREATE TABLE foo(a int);
INSERT INTO foo VALUES (1);

CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text);
CREATE RULE foo_r AS ON UPDATE TO foo
  DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'),
 (DEFAULT, new.a, 'new');

UPDATE foo SET a = 2 WHERE a = 1;

ERROR:  unrecognized node type: 43

There's a similar example to this in the regression tests, but it
doesn't test DEFAULT-handling.

It's also possible for the current code to cause the same VALUES RTE
to be rewritten multiple times, when recursing into product queries
(if the rule action doesn't add any more stuff to the query's
fromlist). That turns out to be harmless, because the second time
round it will no longer contain any defaults, but it's technically
incorrect, and certainly a waste of cycles.

So I think what the code needs to do is examine the targetlist, and
identify the VALUES RTE that the current query is using as a source,
and rewrite just that RTE (so any original VALUES RTE is rewritten at
the top level, and any VALUES RTEs from rule actions are rewritten
while recursing, and none are rewritten more than once), as in the
attached patch.

While at it, I noticed an XXX code comment questioning whether any of
this applies to MERGE. The answer is "no", because MERGE actions don't
allow multi-row inserts, so I think it's worth updating that comment
to make that clearer.

Regards,
Dean
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
new file mode 100644
index fb0c687..5ec4b91
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3743,26 +3743,34 @@ RewriteQuery(Query *parsetree, List *rew
 		 */
 		if (event == CMD_INSERT)
 		{
+			ListCell   *lc2;
 			RangeTblEntry *values_rte = NULL;
 
 			/*
 			 * If it's an INSERT ... VALUES (...), (...), ... there will be a
-			 * single RTE for the VALUES targetlists.
+			 * unique VALUES RTE referred to by the targetlist.  At the top
+			 * level, this will be the VALUES lists from the original query,
+			 * and while recursively processing each product query, it will be
+			 * the VALUES lists from the rule action, if any.  As with
+			 * rewriteValuesRTE(), we need only worry about targetlist entries
+			 * that contain simple Vars referencing the VALUES RTE.
 			 */
-			if (list_length(parsetree->jointree->fromlist) == 1)
+			foreach(lc2, parsetree->targetList)
 			{
-RangeTblRef *rtr = (RangeTblRef *) linitial(parsetree->jointree->fromlist);
+TargetEntry *tle = (TargetEntry *) lfirst(lc2);
 
-if (IsA(rtr, RangeTblRef))
+if (IsA(tle->expr, Var))
 {
-	RangeTblEntry *rte = rt_fetch(rtr->rtindex,
+	Var		   *var = (Var *) tle->expr;
+	RangeTblEntry *rte = rt_fetch(var->varno,
   parsetree->rtable);
 
 	if (rte->rtekind == RTE_VALUES)
 	{
 		values_rte = rte;
-		values_rte_index = rtr->rtindex;
+		values_rte_index = var->varno;
 	}
+	break;
 }
 			}
 
@@ -3837,7 +3845,11 @@ RewriteQuery(Query *parsetree, List *rew
 		break;
 	case CMD_UPDATE:
 	case CMD_INSERT:
-		/* XXX is it possible to have a VALUES clause? */
+
+		/*
+		 * MERGE actions do not permit multi-row INSERTs, so
+		 * there is no VALUES RTE to deal with here.
+		 */
 		action->targetList =
 			rewriteTargetListIU(action->targetList,
 action->commandType,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
new file mode 100644
index 624d0e5..9c28dd9
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2938,11 +2938,11 @@ select pg_get_viewdef('shoe'::regclass,0
 --
 -- check multi-row VALUES in rules
 --
-create table rules_src(f1 int, f2 int);
-create table rules_log(f1 int, f2 int, tag text);
+create table rules_src(f1 int, f2 int default 0);
+create table rules_log(f1 int, f2 int, tag text, id serial);
 insert into rules_src values(1,2), (11,12);
 create rule r1 as on update to rules_src do also
-  insert into rules_log values(old.*, 'old'), (new.*, 'new');
+  insert into rules_log values(old.*, 'old', default), (new.*, 'new', default);
 update rules_src set f2 = f2 + 1;
 update rules_src set f2 = f2 * 10;
 select * from rules_src;
@@ -2953,16 +2953,16 @@ select * from rules_src;
 (2 rows)
 
 select * from rules_log;
- f1 | f2  | tag 
-+-+-
-  1 |   2 

Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

2022-11-23 Thread David Rowley
On Wed, 23 Nov 2022 at 23:13, Alex Fan  wrote:
> I am new to the postgres community and apologise for resending this as the 
> previous one didn't include patch properly and didn't cc reviewers (maybe the 
> reason it has been buried in mailing list for months)

Welcome to the community!

I've not looked at your patch, but I have noticed that you have
assigned some reviewers to the CF entry yourself.  Unless these people
know about that, this is likely a bad choice. People usually opt to
review patches of their own accord rather than because the patch
author put their name on the reviewer list.

There are a few reasons that the patch might not be getting much attention:

1. The CF entry ([1]) states that the patch is "Waiting on Author".
If you've done what you need to do, and are waiting for review, "Needs
review" might be a better state. Currently people browsing the CF app
will assume you need to do more work before it's worth looking at your
patch.
2. The CF entry already has reviewers listed.  People looking for a
patch to review are probably more likely to pick one with no reviewers
listed as they'd expect the existing listed reviewers to be taking
care of reviews for a particular patch. The latter might be unlikely
to happen given you've assigned reviewers yourself without asking them
(at least you didn't ask me after you put me on the list).
3. Target version is 17.  What's the reason for that? The next version is 16.

I'd recommend setting the patch to "Needs review" and removing all the
reviewers that have not confirmed to you that they'll review the
patch.  I'd also leave the target version blank or set it to 16.

There might be a bit more useful information for you in [2].

David

[1] https://commitfest.postgresql.org/40/3857/
[2] https://wiki.postgresql.org/wiki/Submitting_a_Patch




Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-23 Thread Alvaro Herrera
On 2022-Nov-23, Alvaro Herrera wrote:

> I suggest that we could improve that elog() so that it includes the
> members of the multixact in question, which could help us better
> understand what is going on.

Something like the attached.  It would result in output like this:
WARNING:  new multixact has more than one updating member: 0 2[17378 (keysh), 
17381 (nokeyupd)]

Then it should be possible to trace (in pg_waldump output) the
operations of each of the transactions that have any status in the
multixact that includes some form of "upd".

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."(Michael Brusser)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 204aa95045..e1191a7564 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -799,7 +799,8 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
 			if (ISUPDATE_from_mxstatus(members[i].status))
 			{
 if (has_update)
-	elog(ERROR, "new multixact has more than one updating member");
+	elog(ERROR, "new multixact has more than one updating member: %s",
+		 mxid_to_string(InvalidMultiXactId, nmembers, members));
 has_update = true;
 			}
 		}


Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-23 Thread Amit Kapila
On Tue, Nov 22, 2022 at 7:23 PM Hayato Kuroda (Fujitsu)
 wrote:
>
>
> 07. proto.c - logicalrep_write_stream_abort()
>
> We may able to add assertions for abort_lsn and abort_time, like xid and 
> subxid.
>

If you see logicalrep_write_stream_commit(), we have an assertion for
xid but not for LSN and other parameters. I think the current coding
in the patch is consistent with that.

>
> 08. guc_tables.c - ConfigureNamesInt
>
> ```
> _sync_workers_per_subscription,
> +   2, 0, MAX_PARALLEL_WORKER_LIMIT,
> +   NULL, NULL, NULL
> +   },
> ```
>
> The upper limit for max_sync_workers_per_subscription seems to be wrong, it 
> should
> be used for max_parallel_apply_workers_per_subscription.
>

Right, I don't know why this needs to be changed in the first place.


-- 
With Regards,
Amit Kapila.




Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.

2022-11-23 Thread Alex Fan
Hi,

I am new to the postgres community and apologise for resending this as the
previous one didn't include patch properly and didn't cc reviewers (maybe
the reason it has been buried in mailing list for months)

Adding to previous email, this patch exposes its own C API for creating
ObjectLinkingLayer in a similar fashion as
LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager since orc
doesn't expose it yet.

Thanks and really appreciate if someone can offer a review to this and help
get it merged.

Cheers,
Alex


On Mon, Aug 29, 2022 at 5:46 PM Alex Fan  wrote:

> This brings the bonus of support jitting on riscv64 (included in this
> patch)
> and other platforms Rtdyld doesn't support, e.g. windows COFF.
>
> Currently, llvm doesn't expose jitlink (ObjectLinkingLayer) via C API, so
> a wrapper is added. This also adds minor llvm 15 compat fix that is needed
> ---
>  config/llvm.m4|  1 +
>  src/backend/jit/llvm/llvmjit.c| 67 +--
>  src/backend/jit/llvm/llvmjit_wrap.cpp | 35 ++
>  src/include/jit/llvmjit.h |  9 
>  4 files changed, 108 insertions(+), 4 deletions(-)
>
> diff --git a/config/llvm.m4 b/config/llvm.m4
> index 3a75cd8b4d..a31b8b304a 100644
> --- a/config/llvm.m4
> +++ b/config/llvm.m4
> @@ -75,6 +75,7 @@ AC_DEFUN([PGAC_LLVM_SUPPORT],
>engine) pgac_components="$pgac_components $pgac_component";;
>debuginfodwarf) pgac_components="$pgac_components $pgac_component";;
>orcjit) pgac_components="$pgac_components $pgac_component";;
> +  jitlink) pgac_components="$pgac_components $pgac_component";;
>passes) pgac_components="$pgac_components $pgac_component";;
>native) pgac_components="$pgac_components $pgac_component";;
>perfjitevents) pgac_components="$pgac_components $pgac_component";;
> diff --git a/src/backend/jit/llvm/llvmjit.c
> b/src/backend/jit/llvm/llvmjit.c
> index 6c72d43beb..d8b840da8c 100644
> --- a/src/backend/jit/llvm/llvmjit.c
> +++ b/src/backend/jit/llvm/llvmjit.c
> @@ -229,6 +229,11 @@ llvm_release_context(JitContext *context)
>  LLVMModuleRef
>  llvm_mutable_module(LLVMJitContext *context)
>  {
> +#ifdef __riscv
> +   const char* abiname;
> +   const char* target_abi = "target-abi";
> +   LLVMMetadataRef abi_metadata;
> +#endif
> llvm_assert_in_fatal_section();
>
> /*
> @@ -241,6 +246,40 @@ llvm_mutable_module(LLVMJitContext *context)
> context->module = LLVMModuleCreateWithName("pg");
> LLVMSetTarget(context->module, llvm_triple);
> LLVMSetDataLayout(context->module, llvm_layout);
> +#ifdef __riscv
> +#if __riscv_xlen == 64
> +#ifdef __riscv_float_abi_double
> +   abiname = "lp64d";
> +#elif defined(__riscv_float_abi_single)
> +   abiname = "lp64f";
> +#else
> +   abiname = "lp64";
> +#endif
> +#elif __riscv_xlen == 32
> +#ifdef __riscv_float_abi_double
> +   abiname = "ilp32d";
> +#elif defined(__riscv_float_abi_single)
> +   abiname = "ilp32f";
> +#else
> +   abiname = "ilp32";
> +#endif
> +#else
> +   elog(ERROR, "unsupported riscv xlen %d", __riscv_xlen);
> +#endif
> +   /*
> +* set this manually to avoid llvm defaulting to soft
> float and
> +* resulting in linker error: `can't link double-float
> modules
> +* with soft-float modules`
> +* we could set this for TargetMachine via MCOptions, but
> there
> +* is no C API for it
> +* ref:
> https://github.com/llvm/llvm-project/blob/afa520ab34803c82587ea6759bfd352579f741b4/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp#L90
> +*/
> +   abi_metadata = LLVMMDStringInContext2(
> +   LLVMGetModuleContext(context->module),
> +   abiname, strlen(abiname));
> +   LLVMAddModuleFlag(context->module,
> LLVMModuleFlagBehaviorOverride,
> +   target_abi, strlen(target_abi), abi_metadata);
> +#endif
> }
>
> return context->module;
> @@ -786,6 +825,8 @@ llvm_session_initialize(void)
> char   *error = NULL;
> char   *cpu = NULL;
> char   *features = NULL;
> +   LLVMRelocMode reloc=LLVMRelocDefault;
> +   LLVMCodeModel codemodel=LLVMCodeModelJITDefault;
> LLVMTargetMachineRef opt0_tm;
> LLVMTargetMachineRef opt3_tm;
>
> @@ -820,16 +861,21 @@ llvm_session_initialize(void)
> elog(DEBUG2, "LLVMJIT detected CPU \"%s\", with features \"%s\"",
>  cpu, features);
>
> +#ifdef __riscv
> +   reloc=LLVMRelocPIC;
> +   codemodel=LLVMCodeModelMedium;
> +#endif
> +
> opt0_tm =
> LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu,
> features,
>
> LLVMCodeGenLevelNone,
> -
>  LLVMRelocDefault,
> -
>  LLVMCodeModelJITDefault);
> + 

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-23 Thread Thomas Munro
On Wed, Nov 23, 2022 at 2:42 PM Andres Freund  wrote:
> The failure has to be happening in wait_for_postmaster_promote(), because the
> standby2 is actually successfully promoted.

I assume this is ext4.  Presumably anything that reads the
controlfile, like pg_ctl, pg_checksums, pg_resetwal,
pg_control_system(), ... by reading without interlocking against
writes could see garbage.  I have lost track of the versions and the
thread, but I worked out at some point by experimentation that this
only started relatively recently for concurrent read() and write(),
but always happened with concurrent pread() and pwrite().  The control
file uses the non-p variants which didn't mash old/new data like
grated cheese under concurrency due to some implementation detail, but
now does.




Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-11-23 Thread mahendrakar s
Hi,


We validated on  libpq handling OAuth natively with different flows
with different OIDC certified providers.

Flows: Device Code, Client Credentials and Refresh Token.
Providers: Microsoft, Google and Okta.
Also validated with OAuth provider Github.

We propose using OpenID Connect (OIDC) as the protocol, instead of
OAuth, as it is:
- Discovery mechanism to bridge the differences and provide metadata.
- Stricter protocol and certification process to reliably identify
which providers can be supported.
- OIDC is designed for authentication, while the main purpose of OAUTH is to
authorize applications on behalf of the user.

Github is not OIDC certified, so won’t be supported with this proposal.
However, it may be supported in the future through the ability for the
extension to provide custom discovery document content.

OpenID configuration has a well-known discovery mechanism
for the provider configuration URI which is
defined in OpenID Connect. It allows libpq to fetch
metadata about provider (i.e endpoints, supported grants, response types, etc).

In the attached patch (based on V2 patch in the thread and does not
contain Samay's changes):
- Provider can configure issuer url and scope through the options hook.)
- Server passes on an open discovery url and scope to libpq.
- Libpq handles OAuth flow based on the flow_type sent in the
connection string [1].
- Added callbacks to notify a structure to client tools if OAuth flow
requires user interaction.
- Pg backend uses hooks to validate bearer token.

Note that authentication code flow with PKCE for GUI clients is not
implemented yet.

Proposed next steps:
- Broaden discussion to reach agreement on the approach.
- Implement libpq changes without iddawc
- Prototype GUI flow with pgAdmin

Thanks,
Mahendrakar.

[1]:
connection string for refresh token flow:
./psql -U  -d 'dbname=postgres oauth_client_id=
oauth_flow_type=  oauth_refresh_token='

On Mon, 3 Oct 2022 at 23:34, Andrey Chudnovsky  wrote:
>
> > I think we can probably prototype a callback hook for approach (1)
> > pretty quickly. (2) is a lot more work and investigation, but it's
> > work that I'm interested in doing (when I get the time). I think there
> > are other very good reasons to consider a third-party SASL library,
> > and some good lessons to be learned, even if the community decides not
> > to go down that road.
>
> Makes sense. We will work on (1.) and do some check if there are any
> blockers for a shared solution to support github and google.
>
> On Fri, Sep 30, 2022 at 1:45 PM Jacob Champion  
> wrote:
> >
> > On Fri, Sep 30, 2022 at 7:47 AM Andrey Chudnovsky
> >  wrote:
> > > > How should we communicate those pieces to a custom client when it's
> > > > passing a token directly? The easiest way I can see is for the custom
> > > > client to speak the OAUTHBEARER protocol directly (e.g. SASL plugin).
> > > > If you had to parse the libpq error message, I don't think that'd be
> > > > particularly maintainable.
> > >
> > > I agree that parsing the message is not a sustainable way.
> > > Could you provide more details on the SASL plugin approach you propose?
> > >
> > > Specifically, is this basically a set of extension hooks for the client 
> > > side?
> > > With the need for the client to be compiled with the plugins based on
> > > the set of providers it needs.
> >
> > That's a good question. I can see two broad approaches, with maybe
> > some ability to combine them into a hybrid:
> >
> > 1. If there turns out to be serious interest in having libpq itself
> > handle OAuth natively (with all of the web-facing code that implies,
> > and all of the questions still left to answer), then we might be able
> > to provide a "token hook" in the same way that we currently provide a
> > passphrase hook for OpenSSL keys. By default, libpq would use its
> > internal machinery to take the provider details, navigate its builtin
> > flow, and return the Bearer token. If you wanted to override that
> > behavior as a client, you could replace the builtin flow with your
> > own, by registering a set of callbacks.
> >
> > 2. Alternatively, OAuth support could be provided via a mechanism
> > plugin for some third-party SASL library (GNU libgsasl, Cyrus
> > libsasl2). We could provide an OAuth plugin in contrib that handles
> > the default flow. Other providers could publish their alternative
> > plugins to completely replace the OAUTHBEARER mechanism handling.
> >
> > Approach (2) would make for some duplicated effort since every
> > provider has to write code to speak the OAUTHBEARER protocol. It might
> > simplify provider-specific distribution, since (at least for Cyrus) I
> > think you could build a single plugin that supports both the client
> > and server side. But it would be a lot easier to unknowingly (or
> > knowingly) break the spec, since you'd control both the client and
> > server sides. There would be less incentive to interoperate.
> >
> > Finally, we could potentially take pieces 

Re: Logical Replication Custom Column Expression

2022-11-23 Thread Stavros Koureas
Just one correction for the subscriber
On Subscriber:

test_sub=# CREATE TABLE tab(id int *pkey*, description varchar,
subscription varchar *pkey*);
CREATE TABLE

The subscription table should have the same primary key columns as the
publisher plus one more.
We need to make sure that on update only the same origin data is
being updated.

Στις Τετ 23 Νοε 2022 στις 1:24 π.μ., ο/η Peter Smith 
έγραψε:

> On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas
>  wrote:
> >
> > Reading more carefully what you described, I think you are interested in
> getting something you call origin from publishers, probably some metadata
> from the publications.
> >
> > This identifier in those metadata maybe does not have business value on
> the reporting side. The idea is to use a value which has specific meaning
> to the user at the end.
> >
> > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the
> end based on a dimension table which holds this mapping the user would be
> able to filter the data. So programmatically the user can set the id value
> of the column plus creating the mapping table from an application let’s say
> and be able to distinguish the data.
> >
> > In addition this column should have the ability to be part of the
> primary key on the subscription table in order to not conflict with lines
> from other tenants having the same keys.
> >
> >
>
> I was wondering if a simpler syntax solution might also work here.
>
> Imagine another SUBSCRIPTION parameter that indicates to write the
> *name* of the subscription to some pre-defined table column:
> e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1
> CONNECTION '...' WITH (subscription_column);
>
> Logical Replication already allows the subscriber table to have extra
> columns, so you just need to manually create the extra 'subscription'
> column up-front.
>
> Then...
>
> ~~
>
> On Publisher:
>
> test_pub=# CREATE TABLE tab(id int primary key, description varchar);
> CREATE TABLE
>
> test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three');
> INSERT 0 3
>
> test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES;
> CREATE PUBLICATION
>
> ~~
>
> On Subscriber:
>
> test_sub=# CREATE TABLE tab(id int, description varchar, subscription
> varchar);
> CREATE TABLE
>
> test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost
> dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column);
> CREATE SUBSCRIPTION
>
> test_sub=# SELECT * FROM tab;
>  id | description | subscription
> +-+--
>   1 | one | sub_tenant1
>   2 | two | sub_tenant1
>   3 | three   | sub_tenant1
> (3 rows)
>
> ~~
>
> Subscriptions to different tenants would be named differently.
>
> And using other SQL you can map/filter those names however your
> application wants.
>
> --
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>


Re: Logical Replication Custom Column Expression

2022-11-23 Thread Stavros Koureas
It's easy to answer this question.

Imagine that in a software company who sells the product and also offers
reporting solutions, the ERP tables will not have this additional column to
all the tables.
Now the reporting department comes and needs to consolidate all that data
from different databases (publishers) and create one multitenant database
to have all the data.
So in an ERP like NAV or anything else you cannot suggest change all the
code to all of the tables plus all functions to add one additional column
to this table, even that was possible then you cannot work with integers
but you need to work with GUIDs as this column should be predefined to each
ERP. Then joining with GUID in the second phase for reporting
definitely will slow down the performance.

In summary:

   1. Cannot touch the underlying source (important)
   2. GUID identifier column will slow down the reporting performance


Στις Τετ 23 Νοε 2022 στις 5:19 π.μ., ο/η Amit Kapila <
amit.kapil...@gmail.com> έγραψε:

> On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas
>  wrote:
> >
> > Reading more carefully what you described, I think you are interested in
> getting something you call origin from publishers, probably some metadata
> from the publications.
> >
> > This identifier in those metadata maybe does not have business value on
> the reporting side. The idea is to use a value which has specific meaning
> to the user at the end.
> >
> > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the
> end based on a dimension table which holds this mapping the user would be
> able to filter the data. So programmatically the user can set the id value
> of the column plus creating the mapping table from an application let’s say
> and be able to distinguish the data.
> >
>
> In your example, are different tenants represent different publisher
> nodes? If so, why can't we have a predefined column and value for the
> required tables on each publisher rather than logical replication
> generate that value while replicating data?
>
> --
> With Regards,
> Amit Kapila.
>


Re: Non-decimal integer literals

2022-11-23 Thread John Naylor
On Wed, Nov 23, 2022 at 3:54 PM David Rowley  wrote:
>
> Going by [1], clang will actually use multiplication by 16 to
> implement the former. gcc is better and shifts left by 4, so likely
> won't improve things for gcc.  It seems worth doing it this way for
> anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

FWIW, gcc 12.2 generates an imul on my system when compiling in situ. I've
found it useful to run godbolt locally* and load the entire PG file (nicer
to read than plain objdump) -- compilers can make different decisions when
going from isolated snippets to within full functions.

* clone from https://github.com/compiler-explorer/compiler-explorer
install npm 16
run "make" and when finished will show the localhost url
add the right flags, which in this case was

-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Werror=vla -Wendif-labels -Wmissing-format-attribute
-Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security
-fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation -Wno-stringop-truncation -O2
-I/path/to/srcdir/src/include -I/path/to/builddir/src/include  -D_GNU_SOURCE

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-23 Thread Alvaro Herrera
Hello Dimos

On 2022-Nov-22, Dimos Stamatakis wrote:

> When running tpcc on sysbench with high concurrency (96 threads, scale
> factor 5) we realized that a fix for visibility check (introduced in
> PG-14.5) causes sysbench to fail in 1 out of 70 runs.
> The error is the following:
> 
> SQL error, errno = 0, state = 'XX000': new multixact has more than one 
> updating member

Ouch.

I did not remember any reports of this.  Searching I found this recent
one:
https://postgr.es/m/17518-04e368df5ad7f...@postgresql.org

However, the reporter there says they're using 12.10, and according to
src/tools/git_changelog the commit appeared only in 12.12:

Author: Heikki Linnakangas 
Branch: master Release: REL_15_BR [adf6d5dfb] 2022-06-27 08:21:08 +0300
Branch: REL_14_STABLE Release: REL_14_5 [e24615a00] 2022-06-27 08:24:30 +0300
Branch: REL_13_STABLE Release: REL_13_8 [7ba325fd7] 2022-06-27 08:24:35 +0300
Branch: REL_12_STABLE Release: REL_12_12 [af530898e] 2022-06-27 08:24:36 +0300
Branch: REL_11_STABLE Release: REL_11_17 [b49889f3c] 2022-06-27 08:24:37 +0300
Branch: REL_10_STABLE Release: REL_10_22 [4822b4627] 2022-06-27 08:24:38 +0300

Fix visibility check when XID is committed in CLOG but not in procarray.
[...]


Thinking further, one problem in tracking this down is that at this
point the multixact in question is *being created*, so we don't have a
WAL trail we could trace through.

I suggest that we could improve that elog() so that it includes the
members of the multixact in question, which could help us better
understand what is going on.

> This commit was supposed to fix a race condition during the visibility
> check. Please let us know whether you are aware of this issue and if
> there is a quick fix.

I don't think so.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Logical replication missing information

2022-11-23 Thread Boboc Cristi
Hello!Thank you very much, the information you just sent to me is very, very 
valuable!
Best regards, Cristi Boboc 

On Wednesday, November 23, 2022 at 04:45:09 AM GMT+2, Peter Smith 
 wrote:  
 
 On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form
 wrote:
>
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html
> Description:

Hi,

FYI - I have forwarded this post to the hacker's list, where I think
it will receive more attention.

I am not sure why that (above) page was cited -- the section "31.3 Row
Filters" is specifically about row filtering, whereas the items you
reported seem unrelated to row filters, but are generic for all
Logical Replication.

>
> There are several things missing here and some of them I found to be highly
> important:
> 1. How can I find why a logical replication failed. Currently I only can see
> it "does nothing" in pg_stat_subscriptions.

There should be logs reporting any replication conflicts etc. See [1]
for example logs. See also the answer for #2 below.

> 2. In case of copying the existing data: how can I find which tables or
> partitions were processed and which are on the processing queue (while
> monitoring I have observed no specific order or rule).

There is no predictable processing queue or order - The initial
tablesyncs might be happening in multiple asynchronous processes
according to the GUC max_sync_workers_per_subscription [2].

Below I show examples of replicating two tables (tab1 and tab2).

~~

>From the logs you should see which table syncs have completed OK:

e.g. (the initial copy is all good)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:23:18.513 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:23:18.524 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:23:18.593 AEDT [27963] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
2022-11-23 12:23:18.611 AEDT [27965] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
finished

e.g. (where there is conflict in table tab2)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG:  logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:40:56.808 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:40:56.819 AEDT [23405] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:40:56.890 AEDT [23405] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:40:56.890 AEDT [23405] DETAIL:  Key (id)=(1) already exists.
2022-11-23 12:40:56.890 AEDT [23405] CONTEXT:  COPY tab2, line 1
2022-11-23 12:40:56.891 AEDT [3233] LOG:  background worker "logical
replication worker" (PID 23405) exited with exit code 1
2022-11-23 12:40:56.902 AEDT [23403] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
...

~~

Alternatively, you can use some SQL query to discover which tables of
the subscription had attained a READY state. The READY state (denoted
by 'r') means that the initial COPY was completed ok. The table
replication state is found in the 'srsubstate' column. See [3]

e.g. (the initial copy is all good)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;
 srsubid | srrelid | subname | relname | srsubstate
-+-+-+-+
  16418 |  16409 | sub1    | tab1    | r
  16418 |  16402 | sub1    | tab2    | r
(2 rows)

e.g. (where it has a conflict in table tab2, so it did not get to READY state)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23
12:41:37.686 AEDT [24501] LOG:  logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:41:37.774 AEDT [24501] ERROR:  duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:41:37.774 AEDT [24501] DETAIL:  Key (id)=(1) already exists.
2022-11-23 

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-23 Thread Alvaro Herrera
On 2022-Nov-22, Andres Freund wrote:

> ok 10 - standby is in recovery
> # Running: pg_ctl -D 
> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/src/bin/pg_ctl/tmp_check/t_003_promote_standby2_data/pgdata
>  promote
> waiting for server to promotepg_ctl: control file appears to be corrupt
> not ok 11 - pg_ctl promote of standby runs
> 
> #   Failed test 'pg_ctl promote of standby runs'
> #   at 
> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/TestLib.pm
>  line 474.

This triggered me on this proposal I saw yesterday
https://postgr.es/m/02fe0063-bf77-90d0-3cf5-e9fe7c2a4...@postgrespro.ru
I think trying to store more stuff in pg_control is dangerous and we
should resist it.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Operation log for major operations

2022-11-23 Thread Alvaro Herrera
On 2022-Nov-21, Dmitry Koval wrote:

> Concepts.
> -
> * operation log is placed in the file 'global/pg_control', starting from
> position 4097 (log size is 4kB);

I think storing this in pg_control is a bad idea.  That file is
extremely critical and if you break it, you're pretty much SOL on
recovering your data.  I suggest that this should use a separate file.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
 (George Orwell's The Lord of the Rings)




Re: Non-decimal integer literals

2022-11-23 Thread David Rowley
On Wed, 23 Nov 2022 at 21:54, David Rowley  wrote:
> I wonder if you'd be better off with something like:
>
> while (*ptr && isxdigit((unsigned char) *ptr))
> {
> if (unlikely(tmp & UINT64CONST(0xF000)))
> goto out_of_range;
>
> tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
> }

Here's a delta diff with it changed to work that way.

David
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 2942b7c449..ce305b611d 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -136,13 +136,10 @@ pg_strtoint16(const char *s)
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
-   int8digit = hexlookup[(unsigned char) *ptr];
-
-   if (unlikely(pg_mul_s16_overflow(tmp, 16, )) ||
-   unlikely(pg_sub_s16_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xF000))
goto out_of_range;
 
-   ptr++;
+   tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}
}
else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -151,11 +148,10 @@ pg_strtoint16(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '7'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s16_overflow(tmp, 8, )) ||
-   unlikely(pg_sub_s16_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xE000))
goto out_of_range;
+
+   tmp = (tmp << 3) | (*ptr++ - '0');
}
}
else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -164,11 +160,10 @@ pg_strtoint16(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '1'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s16_overflow(tmp, 2, )) ||
-   unlikely(pg_sub_s16_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0x8000))
goto out_of_range;
+
+   tmp = (tmp << 1) | (*ptr++ - '0');
}
}
else
@@ -255,13 +250,10 @@ pg_strtoint32(const char *s)
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
-   int8digit = hexlookup[(unsigned char) *ptr];
-
-   if (unlikely(pg_mul_s32_overflow(tmp, 16, )) ||
-   unlikely(pg_sub_s32_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xF000))
goto out_of_range;
 
-   ptr++;
+   tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}
}
else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
@@ -270,11 +262,10 @@ pg_strtoint32(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '7'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s32_overflow(tmp, 8, )) ||
-   unlikely(pg_sub_s32_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0xE000))
goto out_of_range;
+
+   tmp = (tmp << 3) | (*ptr++ - '0');
}
}
else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))
@@ -283,11 +274,10 @@ pg_strtoint32(const char *s)
 
while (*ptr && (*ptr >= '0' && *ptr <= '1'))
{
-   int8digit = (*ptr++ - '0');
-
-   if (unlikely(pg_mul_s32_overflow(tmp, 2, )) ||
-   unlikely(pg_sub_s32_overflow(tmp, digit, )))
+   if (unlikely(tmp & 0x8000))
goto out_of_range;
+
+   tmp = (tmp << 1) | (*ptr++ - '0');
}
}
else
@@ -382,13 +372,10 @@ pg_strtoint64(const char *s)
ptr += 2;
while (*ptr && isxdigit((unsigned char) *ptr))
{
-   int8digit = hexlookup[(unsigned char) *ptr];
-
-   if (unlikely(pg_mul_s64_overflow(tmp, 16, )) ||
-   unlikely(pg_sub_s64_overflow(tmp, digit, )))
+   if (unlikely(tmp & UINT64CONST(0xF000)))
goto out_of_range;
 
-   ptr++;
+   tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}
  

Re: Non-decimal integer literals

2022-11-23 Thread David Rowley
On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut
 wrote:
> Here is a new patch.

This looks like quite an inefficient way to convert a hex string into an int64:

while (*ptr && isxdigit((unsigned char) *ptr))
{
int8digit = hexlookup[(unsigned char) *ptr];

if (unlikely(pg_mul_s64_overflow(tmp, 16, )) ||
unlikely(pg_sub_s64_overflow(tmp, digit, )))
goto out_of_range;

ptr++;
}

I wonder if you'd be better off with something like:

while (*ptr && isxdigit((unsigned char) *ptr))
{
if (unlikely(tmp & UINT64CONST(0xF000)))
goto out_of_range;

tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++];
}

Going by [1], clang will actually use multiplication by 16 to
implement the former. gcc is better and shifts left by 4, so likely
won't improve things for gcc.  It seems worth doing it this way for
anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway.

David

[1] https://godbolt.org/z/jz6Th6jnM




Re: [DOCS] Stats views and functions not in order?

2022-11-23 Thread Peter Smith
On Thu, Nov 17, 2022 at 8:46 AM David G. Johnston
 wrote:
>
> On Tue, Nov 15, 2022 at 6:39 PM Peter Smith  wrote:
>>
>>
>> I was also wondering (but have not yet done) if the content *outside*
>> the tables should be reordered to match the table 28.1/28.2 order.
>>
>> Thoughts?
>>

Thanks for the feedback/suggestions

>
> I would love to do away with the ToC listing of view names in 28.2 altogether.
>

OK, done. See patch 0006. To prevent all the views sections from
participating in the ToC I simply changed them to  instead of
. I’m not 100% sure if this was a brilliant modification or a
total hack, but it does do exactly what you wanted.

> Also, make it so each view ends up being its own separate page.
>

I did not do this. AFAIK those views of chapter 54 get rendered to
separate pages only because they are top-level . So I do not
know how to put all these stats views onto different pages without
radically changing the document structure. Anyway – doing this would
be incompatible with my  changes of patch 0006 (see above).


> The name of the views in the table should then be the hyperlinks to those 
> pages.
>

OK done. See patch 0005. All the view names (in column one of the
tables) are hyperlinked to the views the same way as Chapter 54 does.
The tables are a lot cleaner now. A couple of inconsistent view ids
were also corrected.

> Basically the way Chapter 54.1 works.  Though the interplay between the top 
> Chapter 54 and 54.1 is a bit repetitive.
>
> https://www.postgresql.org/docs/current/views.html
>
> I wonder whether having the table be structured but the ToC be purely 
> alphabetical would be considered a good idea...
>
> The tables need hyperlinks regardless.  I wouldn't insist on changing the 
> ordering to match the table, especially with the hyperlinks, but I also 
> wouldn't reject it.  Figuring out how to make them one-per-page would be time 
> better spent though.
>

PSA new patches. Now there are 6 of them. If some of the earlier
patches are agreeable can those ones please be committed? (because I
think this patch may be susceptible to needing a big rebase if
anything in those tables changes).

--
Kind Regards,
Peter Smith.
Fujitsu Australia.


v6-0002-Re-order-Table-28.2-Collected-Statistics-Views.patch
Description: Binary data


v6-0003-Re-order-Table-28.12-Wait-Events-of-type-LWLock.patch
Description: Binary data


v6-0005-Cleanup-view-name-hyperlinks-for-Tables-28.1-and-.patch
Description: Binary data


v6-0004-Re-order-Table-28.35-Per-Backend-Statistics-Funct.patch
Description: Binary data


v6-0006-Remove-all-stats-views-from-the-ToC-of-28.2.patch
Description: Binary data


v6-0001-Re-order-sections-of-28.4.-Progress-Reporting.patch
Description: Binary data


Re: Prefetch the next tuple's memory during seqscans

2022-11-23 Thread sirisha chamarthi
On Tue, Nov 22, 2022 at 11:44 PM David Rowley  wrote:

> On Wed, 23 Nov 2022 at 20:29, sirisha chamarthi
>  wrote:
> > I ran your test1 exactly like your setup except the row count is 300
> (with 13275 blocks). Shared_buffers is 128MB and the hardware configuration
> details at the bottom of the mail. It appears Master + 0001 + 0005
> regressed compared to master slightly .
>
> Thank you for running these tests.
>
> Can you share if the plans used for these queries was a parallel plan?
> I had set max_parallel_workers_per_gather to 0 to remove the
> additional variability from parallel query.
>
> Also, 13275 blocks is 104MBs, does EXPLAIN (ANALYZE, BUFFERS) indicate
> that all pages were in shared buffers? I used pg_prewarm() to ensure
> they were so that the runs were consistent.
>

I reran the test with setting max_parallel_workers_per_gather = 0 and with
pg_prewarm. Appears I missed some step while testing on the master, thanks
for sharing the details. New numbers show master has higher latency
than *Master +
0001 + 0005*.

*Master*

Before vacuum:
latency average = 452.881 ms

After vacuum:
latency average = 393.880 ms

*Master + 0001 + 0005*
Before vacuum:
latency average = 441.832 ms

After vacuum:
latency average = 369.591 ms


Re: New docs chapter on Transaction Management and related changes

2022-11-23 Thread Justin Pryzby
On Tue, Nov 22, 2022 at 01:50:36PM -0500, Bruce Momjian wrote:
> +
> +  
> +   A more complex example with multiple nested subtransactions:
> +
> +BEGIN;
> +INSERT INTO table1 VALUES (1);
> +SAVEPOINT sp1;
> +INSERT INTO table1 VALUES (2);
> +SAVEPOINT sp2;
> +INSERT INTO table1 VALUES (3);
> +RELEASE SAVEPOINT sp2;
> +INSERT INTO table1 VALUES (4))); -- generates an error
> +
> +   In this example, the application requests the release of the savepoint
> +   sp2, which inserted 3.  This changes the insert's
> +   transaction context to sp1.  When the statement
> +   attempting to insert value 4 generates an error, the insertion of 2 and
> +   4 are lost because they are in the same, now-rolled back savepoint,
> +   and value 3 is in the same transaction context.  The application can
> +   now only choose one of these two commands, since all other commands
> +   will be ignored with a warning:
> +
> +   ROLLBACK;
> +   ROLLBACK TO SAVEPOINT sp1;
> +
> +   Choosing ROLLBACK will abort everything, including
> +   value 1, whereas ROLLBACK TO SAVEPOINT sp1 will retain
> +   value 1 and allow the transaction to continue.
> +  

This mentions a warning, but what happens is actually an error:

postgres=!# select;
ERROR:  current transaction is aborted, commands ignored until end of 
transaction block




Re: More efficient build farm animal wakeup?

2022-11-23 Thread Thomas Munro
On Wed, Nov 23, 2022 at 2:09 PM Andres Freund  wrote:
> It's a huge improvement here.

Same here. eelpout + elver looking good, just a fraction of a second
hitting that web server each minute.  Long polling will be better and
shave off 30 seconds (+/- 30) on start time, but this avoids a lot of
useless churn without even needing a local mirror.  Thanks Andrew!




<    1   2