Re: appendBinaryStringInfo stuff

2022-12-21 Thread Tom Lane
David Rowley  writes:
>   22.57%  postgres  [.] escape_json

Hmm ... shouldn't we do something like

-appendStringInfoString(buf, "\\b");
+appendStringInfoCharMacro(buf, '\\');
+appendStringInfoCharMacro(buf, 'b');

and so on in that function?  I'm not convinced that this one
hotspot justifies inlining appendStringInfoString everywhere.

regards, tom lane




Re: Minimal logical decoding on standbys

2022-12-21 Thread Drouvot, Bertrand

Hi,

On 12/21/22 10:06 AM, Drouvot, Bertrand wrote:

Hi,

On 12/20/22 10:41 PM, Robert Haas wrote:

On Tue, Dec 20, 2022 at 3:39 PM Robert Haas  wrote:
I guess whatever else we
do here, we should fix the comments.



Agree, please find attached a patch proposal doing so.



Bottom line is that I think the two cases that have alignment issues
as coded are xl_hash_vacuum_one_page and gistxlogDelete. Everything
else is OK, as far as I can tell right now.



Thanks a lot for the repro(s) and explanations! That's very useful/helpful.

Based on your discovery about the wrong comments above, I'm now tempted to fix 
those 2 alignment issues
by using a FLEXIBLE_ARRAY_MEMBER within those structs (as you proposed in [1]) 
(as that should also prevent
any possible wrong comments about where the array is located).

What do you think?


As mentioned above, It looks to me that making use of a FLEXIBLE_ARRAY_MEMBER 
is a good choice.
So, please find attached v35 making use of a FLEXIBLE_ARRAY_MEMBER in 
xl_hash_vacuum_one_page and gistxlogDelete (your 2 repros are not failing 
anymore).
I've also added a few words in the commit message in 0001 about it.

So, we end up with:

(gdb) ptype /o struct xl_hash_vacuum_one_page
/* offset  |size */  type = struct xl_hash_vacuum_one_page {
/*  0  |   4 */TransactionId snapshotConflictHorizon;
/*  4  |   4 */int ntuples;
/*  8  |   1 */_Bool isCatalogRel;
/* XXX  1-byte hole  */
/* 10  |   0 */OffsetNumber offsets[];
/* XXX  2-byte padding   */

   /* total size (bytes):   12 */
 }

(gdb) ptype /o struct gistxlogDelete
/* offset  |size */  type = struct gistxlogDelete {
/*  0  |   4 */TransactionId snapshotConflictHorizon;
/*  4  |   2 */uint16 ntodelete;
/*  6  |   1 */_Bool isCatalogRel;
/* XXX  1-byte hole  */
/*  8  |   0 */OffsetNumber offsets[];

   /* total size (bytes):8 */
 }

While looking at it, I've a question: xl_hash_vacuum_one_page.ntuples is an 
int, do you see any reason why it is not an uint16? (we would get rid of 4 
bytes in the struct).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom a765103d88411e344bc2b05897631a3e69526467 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Wed, 21 Dec 2022 14:14:10 +
Subject: [PATCH v35 6/6] Fixing Walsender corner case with logical decoding on
 standby.

The problem is that WalSndWaitForWal() waits for the *replay* LSN to
increase, but gets woken up by walreceiver when new WAL has been
flushed. Which means that typically walsenders will get woken up at the
same time that the startup process will be - which means that by the
time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
that the startup process already replayed the record and updated
XLogCtl->lastReplayedEndRecPtr.

Introducing a new condition variable to fix this corner case.
---
 src/backend/access/transam/xlogrecovery.c | 28 
 src/backend/replication/walsender.c   | 31 +--
 src/backend/utils/activity/wait_event.c   |  3 +++
 src/include/access/xlogrecovery.h |  3 +++
 src/include/replication/walsender.h   |  1 +
 src/include/utils/wait_event.h|  1 +
 6 files changed, 59 insertions(+), 8 deletions(-)
  41.2% src/backend/access/transam/
  48.5% src/backend/replication/
   3.6% src/backend/utils/activity/
   3.4% src/include/access/

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index d5a81f9d83..ac8b169ab5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -358,6 +358,9 @@ typedef struct XLogRecoveryCtlData
RecoveryPauseState recoveryPauseState;
ConditionVariable recoveryNotPausedCV;
 
+   /* Replay state (see getReplayedCV() for more explanation) */
+   ConditionVariable replayedCV;
+
slock_t info_lck;   /* locks shared variables shown 
above */
 } XLogRecoveryCtlData;
 
@@ -467,6 +470,7 @@ XLogRecoveryShmemInit(void)
SpinLockInit(>info_lck);
InitSharedLatch(>recoveryWakeupLatch);
ConditionVariableInit(>recoveryNotPausedCV);
+   ConditionVariableInit(>replayedCV);
 }
 
 /*
@@ -1916,6 +1920,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
XLogRecoveryCtl->lastReplayedTLI = *replayTLI;
SpinLockRelease(>info_lck);
 
+   /*
+* wake up walsender(s) used by logical decoding on standby.
+*/
+   ConditionVariableBroadcast(>replayedCV);
+
/*
 * If rm_redo called XLogRequestWalReceiverReply, then we wake up the
 * receiver so that it notices the updated 

Re: appendBinaryStringInfo stuff

2022-12-21 Thread David Rowley
On Tue, 20 Dec 2022 at 11:26, David Rowley  wrote:
> It would be good to see some measurements to find out how much adding
> the strlen calls back in would cost us.

I tried this out.  I'm not pretending I found the best test which
highlights how much the performance will change in a real-world case.
I just wanted to try to get an indication of if changing jsonb's
output function to make more use of appendStringInfoString instead of
appendBinaryStringInfo is likely to affect performance.

Also, in test 2, I picked a use case that makes quite a bit of use of
appendStringInfoString already and checked if inlining that function
would help improve things.  I imagine test 2 really is not
bottlenecked on appendStringInfoString enough to get a true idea of
how much inlining appendStringInfoString could really help (spoiler,
it helps quite a bit)

Test 1: See if using appendStringInfoString instead of
appendBinaryStringInfo hinders jsonb output performance.

setup:
create table jb (j jsonb);
insert into jb select row_to_json(pg_class) from pg_class;
vacuum freeze analyze jb;

bench.sql:
select sum(length(j::text)) from jb;

master (@3f28bd73):
$ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency
latency average = 1.896 ms
latency average = 1.885 ms
latency average = 1.899 ms

  22.57%  postgres  [.] escape_json
  21.83%  postgres  [.] pg_utf_mblen
   9.23%  postgres  [.] JsonbIteratorNext.part.0
   7.12%  postgres  [.] AllocSetAlloc
   4.07%  postgres  [.] pg_mbstrlen_with_len
   3.71%  postgres  [.] JsonbToCStringWorker
   3.70%  postgres  [.] fillJsonbValue
   3.17%  postgres  [.] appendBinaryStringInfo
   2.95%  postgres  [.] enlargeStringInfo
   2.09%  postgres  [.] jsonb_put_escaped_value
   1.89%  postgres  [.] palloc

master + 0001-Use-appendStringInfoString-instead-of-appendBinarySt.patch

$ pgbench -n -T 60 -f bench.sql -M prepared postgres | grep latency
latency average = 1.912 ms
latency average = 1.912 ms
latency average = 1.912 ms (~1% slower)

  22.38%  postgres  [.] escape_json
  21.98%  postgres  [.] pg_utf_mblen
   9.07%  postgres  [.] JsonbIteratorNext.part.0
   5.93%  postgres  [.] AllocSetAlloc
   4.11%  postgres  [.] pg_mbstrlen_with_len
   3.87%  postgres  [.] fillJsonbValue
   3.66%  postgres  [.] JsonbToCStringWorker
   2.28%  postgres  [.] enlargeStringInfo
   2.15%  postgres  [.] appendStringInfoString
   1.98%  postgres  [.] jsonb_put_escaped_value
   1.92%  postgres  [.] palloc
   1.58%  postgres  [.] appendBinaryStringInfo
   1.42%  postgres  [.] pnstrdup

Test 2: Test if inlining appendStringInfoString helps

bench.sql:
select sum(length(pg_get_ruledef(oid))) from pg_rewrite;

master (@3f28bd73):
$ pgbench -n -T 60 -f bench.sql postgres | grep latency
latency average = 16.355 ms
latency average = 16.290 ms
latency average = 16.303 ms

static inline appendStringInfoString
$ pgbench -n -T 60 -f bench.sql postgres | grep latency
latency average = 15.690 ms
latency average = 15.575 ms
latency average = 15.604 ms (~4.4% faster)

David




Re: Force streaming every change in logical decoding

2022-12-21 Thread Kyotaro Horiguchi
At Thu, 22 Dec 2022 12:35:46 +0530, Amit Kapila  wrote 
in 
> I have addressed these comments in the attached. Additionally, I have
> modified the docs and commit messages to make those clear. I think
> instead of adding new tests with this patch, it may be better to
> change some of the existing tests related to streaming to use this
> parameter as that will clearly show one of the purposes of this patch.

Being late but I'm warried that we might sometime regret that the lack
of the explicit default. Don't we really need it?

+Allows streaming or serializing changes immediately in logical 
decoding.
+The allowed values of logical_decoding_mode are the
+empty string and immediate. When set to
+immediate, stream each change if
+streaming option is enabled, otherwise, serialize
+each change.  When set to an empty string, which is the default,
+decoding will stream or serialize changes when
+logical_decoding_work_mem is reached.

With (really) fresh eyes, I took a bit long time to understand what
the "streaming" option is. Couldn't we augment the description by a
bit?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Force streaming every change in logical decoding

2022-12-21 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for updating the patch. I have also checked the patch
and basically it has worked well. Almost all things I found were modified
by v4.

One comment: while setting logical_decoding_mode to wrong value,
I got unfriendly ERROR message.

```
postgres=# SET logical_decoding_mode = 1;
ERROR:  invalid value for parameter "logical_decoding_mode": "1"
HINT:  Available values: , immediate
```

Here all acceptable enum should be output as HINT, but we could not see the 
empty string.
Should we modify config_enum_get_options() for treating empty string, maybe
like (empty)? Or we can ignore now.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Force streaming every change in logical decoding

2022-12-21 Thread Masahiko Sawada
On Thu, Dec 22, 2022 at 4:05 PM Amit Kapila  wrote:
>
> On Wed, Dec 21, 2022 at 7:25 PM Masahiko Sawada  wrote:
> >
> > On Wed, Dec 21, 2022 at 10:14 PM shiy.f...@fujitsu.com
> >  wrote:
> >
> > The patch looks good to me. Some minor comments are:
> >
> > - * limit, but we might also adapt a more elaborate eviction strategy
> > - for example
> > - * evicting enough transactions to free certain fraction (e.g. 50%)
> > of the memory
> > - * limit.
> > + * limit, but we might also adapt a more elaborate eviction strategy - for
> > + * example evicting enough transactions to free certain fraction (e.g. 
> > 50%) of
> > + * the memory limit.
> >
> > This change is not relevant with this feature.
> >
> > ---
> > +if (logical_decoding_mode == LOGICAL_DECODING_MODE_DEFAULT
> > +&& rb->size < logical_decoding_work_mem * 1024L)
> >
> > Since we put '&&' before the new line in all other places in
> > reorderbuffer.c, I think it's better to make it consistent. The same
> > is true for the change for while loop in the patch.
> >
>
> I have addressed these comments in the attached. Additionally, I have
> modified the docs and commit messages to make those clear.

Thanks!

>  I think
> instead of adding new tests with this patch, it may be better to
> change some of the existing tests related to streaming to use this
> parameter as that will clearly show one of the purposes of this patch.

+1. I think test_decoding/sql/stream.sql and spill.sql are good
candidates and we change logical replication TAP tests in a separate
patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Force streaming every change in logical decoding

2022-12-21 Thread Amit Kapila
On Wed, Dec 21, 2022 at 7:25 PM Masahiko Sawada  wrote:
>
> On Wed, Dec 21, 2022 at 10:14 PM shiy.f...@fujitsu.com
>  wrote:
>
> The patch looks good to me. Some minor comments are:
>
> - * limit, but we might also adapt a more elaborate eviction strategy
> - for example
> - * evicting enough transactions to free certain fraction (e.g. 50%)
> of the memory
> - * limit.
> + * limit, but we might also adapt a more elaborate eviction strategy - for
> + * example evicting enough transactions to free certain fraction (e.g. 50%) 
> of
> + * the memory limit.
>
> This change is not relevant with this feature.
>
> ---
> +if (logical_decoding_mode == LOGICAL_DECODING_MODE_DEFAULT
> +&& rb->size < logical_decoding_work_mem * 1024L)
>
> Since we put '&&' before the new line in all other places in
> reorderbuffer.c, I think it's better to make it consistent. The same
> is true for the change for while loop in the patch.
>

I have addressed these comments in the attached. Additionally, I have
modified the docs and commit messages to make those clear. I think
instead of adding new tests with this patch, it may be better to
change some of the existing tests related to streaming to use this
parameter as that will clearly show one of the purposes of this patch.

-- 
With Regards,
Amit Kapila.


v4-0001-Add-logical_decoding_mode-GUC.patch
Description: Binary data


Re: [BUG] pg_upgrade test fails from older versions.

2022-12-21 Thread Anton A. Melnikov

Hello!

Divided patch into two parts: first part refers to the modification of
the old dump while the second one relates to dump filtering.

1) v2-0001-Remove-aclitem-from-old-dump.patch

On 19.12.2022 06:10, Michael Paquier wrote:

This is forgetting about materialized views, which is something that
pg_upgrade would also complain about when checking for relations with
aclitems.  As far as I can see, the only place in the main regression
test suite where we have an aclitem attribute is tab_core_types for
HEAD and the stable branches, so it would be enough to do this
change.  Anyway, wouldn't it be better to use the same conditions as
the WITH OIDS queries a few lines above, at least for consistency?

Note that check_for_data_types_usage() checks for tables, matviews and
indexes.


Found that 'ALTER ... ALTER COLUMN SET DATA TYPE text'
is not applicable to materialized views and indexes as well as DROP COLUMN.
So couldn't make anything better than drop its in the old dump if they
contain at least one column of 'aclitem' type.

i've tested this script with:
CREATE TABLE acltable AS SELECT 1 AS int, 'postgres=a/postgres'::aclitem AS 
aclitem;
CREATE MATERIALIZED VIEW aclmview AS SELECT 'postgres=a/postgres'::aclitem AS 
aclitem;
CREATE INDEX aclindex on acltable (int) INCLUDE (aclitem);
performed in the regression database before creating the old dump.

The only thing i haven't been able to find a case when an an 'acltype' column 
would
be preserved in the index when this type was replaced in the parent table.
So passing relkind = 'i' is probably redundant.
If it is possible to find such a case, it would be very interesting.

Also made the replacement logic for 'acltype' in the tables more closer
to above the script that removes OIDs columns. In this script found likewise 
that
ALTER TABLE ... SET WITHOUT OIDS is not applicable to materialized views
and ALTER MATERIALIZED VIEW doesn't support WITHOUT OIDS clause.
Besides i couldn't find any legal way to create materialized view with oids in 
versions 11 or lower.
Command 'CREATE MATERIALIZED VIEW' doesn't support WITH OIDS or (OIDS) clause,
as well as ALTER MATERIALIZED VIEW as mentioned above.
Even with GUC default_with_oids = true":
CREATE TABLE oidtable AS SELECT 1 AS int;
CREATE MATERIALIZED VIEW oidmv AS SELECT * FROM oidtable;
give:
postgres=# SELECT oid::regclass::text FROM pg_class WHERE relname !~ '^pg_' AND 
relhasoids;
   oid
--
 oidtable
(1 row)
So suggest to exclude the check of materialized views from this DO block.
Would be grateful for remarks if i didn't consider some cases.

2) v2-0002-Additional-dumps-filtering.patch

On 19.12.2022 06:16, Michael Paquier wrote:


While thinking about that, an extra idea popped in my mind as it may
be interesting to be able to filter out some of the diffs in some
contexts.  So what about adding in 002_pg_upgrade.pl a small-ish hook
in the shape of a new environment variable pointing to a file adds
some custom filtering rules?


Yes. Made a hook that allows to proceed an external text file with additional
filtering rules and example of such file. Please take a look on it.

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit f4a6f18f008ab4acd0d6923ddce7aa6d20bef08f
Author: Anton A. Melnikov 
Date:   Thu Dec 22 07:22:53 2022 +0300

Remove type 'aclitem' from old dump when test upgrade from older versions.

diff --git a/src/bin/pg_upgrade/upgrade_adapt.sql b/src/bin/pg_upgrade/upgrade_adapt.sql
index 27c4c7fd01..84389f3e5b 100644
--- a/src/bin/pg_upgrade/upgrade_adapt.sql
+++ b/src/bin/pg_upgrade/upgrade_adapt.sql
@@ -19,7 +19,8 @@ SELECT
   ver <= 906 AS oldpgversion_le96,
   ver <= 1000 AS oldpgversion_le10,
   ver <= 1100 AS oldpgversion_le11,
-  ver <= 1300 AS oldpgversion_le13
+  ver <= 1300 AS oldpgversion_le13,
+  ver <= 1500 AS oldpgversion_le15
   FROM (SELECT current_setting('server_version_num')::int / 100 AS ver) AS v;
 \gset
 
@@ -72,7 +73,7 @@ DO $stmt$
 FROM pg_class
 WHERE relname !~ '^pg_'
   AND relhasoids
-  AND relkind in ('r','m')
+  AND relkind = 'r'
 ORDER BY 1
   LOOP
 execute 'ALTER TABLE ' || rec || ' SET WITHOUT OIDS';
@@ -89,3 +90,58 @@ DROP OPERATOR public.#%# (pg_catalog.int8, NONE);
 DROP OPERATOR public.!=- (pg_catalog.int8, NONE);
 DROP OPERATOR public.#@%# (pg_catalog.int8, NONE);
 \endif
+
+-- The internal format of "aclitem" changed in PostgreSQL version 16
+-- so replace it with text type in tables and drop materialized views
+-- and indexes that contain column(s) of "aclitem" type.
+\if :oldpgversion_le15
+DO $$
+  DECLARE
+rec text;
+	col text;
+  BEGIN
+  FOR rec in
+SELECT oid::regclass::text
+FROM pg_class
+WHERE relname !~ '^pg_'
+  AND relkind = 'r'
+ORDER BY 1
+  LOOP
+	FOR col in
+		SELECT attname
+		FROM pg_attribute
+		WHERE attrelid::regclass::text = rec
+		  AND atttypid = 'aclitem'::regtype
+	LOOP
+		execute 'ALTER TABLE 

Re: Error-safe user functions

2022-12-21 Thread Tom Lane
Andrew Dunstan  writes:
> And here's another for contrib/seg
> I'm planning to commit these two in the next day or so.

I didn't look at the jsonpath one yet.  The seg patch passes
an eyeball check, with one minor nit: in seg_atof,

+   *result = float4in_internal(value, NULL, "real", value, escontext);

don't we want to use "seg" as the type_name?

Even more nitpicky, in

+seg_yyerror(SEG *result, struct Node *escontext, const char *message)
 {
+   if (SOFT_ERROR_OCCURRED(escontext))
+   return;

I'd be inclined to add some explanation, say

+seg_yyerror(SEG *result, struct Node *escontext, const char *message)
 {
+   /* if we already reported an error, don't overwrite it */
+   if (SOFT_ERROR_OCCURRED(escontext))
+   return;

regards, tom lane




Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-21 Thread Masahiko Sawada
On Wed, Dec 21, 2022 at 2:32 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wed, Dec 21, 2022 9:07 AM Peter Smith  wrote:
> > FYI - applying v63-0001 using the latest master does not work.
> >
> > git apply 
> > ../patches_misc/v63-0001-Perform-streaming-logical-transactions-by-
> > parall.patch
> > error: patch failed: src/backend/replication/logical/meson.build:1
> > error: src/backend/replication/logical/meson.build: patch does not apply
> >
> > Looks like a recent commit [1] to add copyrights broke the patch
>
> Thanks for your reminder.
> Rebased the patch set.
>
> Attach the new patch set which also includes some
> cosmetic comment changes.
>

Thank you for updating the patch. Here are some comments on v64 patches:

While testing the patch, I realized that if all streamed transactions
are handled by parallel workers, there is no chance for the leader to
call maybe_reread_subscription() except for when waiting for the next
message. Due to this, the leader didn't stop for a while even if the
subscription gets disabled. It's an extreme case since my test was
that pgbench runs 30 concurrent transactions and logical_decoding_mode
= 'immediate', but we might want to make sure to call
maybe_reread_subscription() at least after committing/preparing a
transaction.

---
+if (pg_atomic_read_u32(>pending_stream_count) == 0)
+{
+if (pa_has_spooled_message_pending())
+return;
+
+elog(ERROR, "invalid pending streaming block number");
+}

I think it's helpful if the error message shows the invalid block number.

---
On Wed, Dec 7, 2022 at 10:13 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, December 7, 2022 7:51 PM Masahiko Sawada 
>  wrote:
> > ---
> > If a value of max_parallel_apply_workers_per_subscription is not
> > sufficient, we get the LOG "out of parallel apply workers" every time
> > when the apply worker doesn't launch a worker. But do we really need
> > this log? It seems not consistent with
> > max_sync_workers_per_subscription behavior. I think we can check if
> > the number of running parallel workers is less than
> > max_parallel_apply_workers_per_subscription before calling
> > logicalrep_worker_launch(). What do you think?
>
> (Sorry, I missed this comment in last email)
>
> I personally feel giving a hint might help user to realize that the
> max_parallel_applyxxx is not enough for the current workload and then they can
> adjust the parameter. Otherwise, user might have an easy way to check if more
> workers are needed.
>

Sorry, I missed this comment.

I think the number of concurrent transactions on the publisher could
be several hundreds, and the number of streamed transactions among
them could be several tens. I agree setting
max_parallel_apply_workers_per_subscription to a value high enough is
ideal but I'm not sure we want to inform users immediately that the
setting value is not enough. I think that with the default value
(i.e., 2), it will not be enough for many systems and the server logs
could be flood with the LOG "out of parallel apply workers". If we
want to give a hint to users, we can probably show the statistics on
pg_stat_subscription_stats view such as the number of streamed
transactions that are handled by the leader and parallel workers.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




An oversight in ExecInitAgg for grouping sets

2022-12-21 Thread Richard Guo
I happened to notice $subject.  It happens when we build eqfunctions for
each grouping set.

 /* for each grouping set */
 for (int k = 0; k < phasedata->numsets; k++)
 {
 int length = phasedata->gset_lengths[k];

 if (phasedata->eqfunctions[length - 1] != NULL)
 continue;

 phasedata->eqfunctions[length - 1] =
 execTuplesMatchPrepare(scanDesc,
length,
aggnode->grpColIdx,
aggnode->grpOperators,
aggnode->grpCollations,
(PlanState *) aggstate);
 }

If it is an empty grouping set, its length will be zero, and accessing
phasedata->eqfunctions[length - 1] is not right.

I think we can just skip building the eqfunctions for empty grouping
set.

--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -3494,6 +3494,10 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
{
int length = phasedata->gset_lengths[k];

+   /* skip empty grouping set */
+   if (length == 0)
+   continue;
+
if (phasedata->eqfunctions[length - 1] != NULL)
continue;

Thanks
Richard


RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-21 Thread Takamichi Osumi (Fujitsu)
Hi,


On Thursday, December 15, 2022 12:53 PM Amit Kapila  
wrote:
> On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi 
> wrote:
> >
> > At Wed, 14 Dec 2022 10:46:17 +, "Hayato Kuroda (Fujitsu)"
> >  wrote in
> > > I have implemented and tested that workers wake up per
> > > wal_receiver_timeout/2 and send keepalive. Basically it works well, but I
> found two problems.
> > > Do you have any good suggestions about them?
> > >
> > > 1)
> > >
> > > With this PoC at present, workers calculate sending intervals based
> > > on its wal_receiver_timeout, and it is suppressed when the parameter is 
> > > set
> to zero.
> > >
> > > This means that there is a possibility that walsender is timeout
> > > when wal_sender_timeout in publisher and wal_receiver_timeout in
> subscriber is different.
> > > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is
> > > 5min,
> >
> > It seems to me wal_receiver_status_interval is better for this use.
> > It's enough for us to docuemnt that "wal_r_s_interval should be
> > shorter than wal_sener_timeout/2 especially when logical replication
> > connection is using min_apply_delay. Otherwise you will suffer
> > repeated termination of walsender".
> >
> 
> This sounds reasonable to me.
Okay, I changed the time interval to wal_receiver_status_interval
and added this description about timeout.

FYI, wal_receiver_status_interval by definition specifies
the minimum frequency for the WAL receiver process to send information
to the upstream. So I utilized the value for WaitLatch directly.
My descriptions of the documentation change follow it.

> > > and min_apply_delay is 10min. The worker on subscriber will wake up
> > > per 2.5min and send keepalives, but walsender exits before the message
> arrives to publisher.
> > >
> > > One idea to avoid that is to send the min_apply_delay subscriber
> > > option to publisher and compare them, but it may be not sufficient.
> > > Because XXX_timout GUC parameters could be modified later.
> >
> > # Anyway, I don't think such asymmetric setup is preferable.
> >
> > > 2)
> > >
> > > The issue reported by Vignesh-san[1] has still remained. I have
> > > already analyzed that [2], the root cause is that flushed WAL is not
> > > updated and sent to the publisher. Even if workers send keepalive
> > > messages to pub during the delay, the flushed position cannot be modified.
> >
> > I didn't look closer but the cause I guess is walsender doesn't die
> > until all WAL has been sent, while logical delay chokes replication
> > stream.
For the (2) issue, a new thread has been created independently from this thread 
in [1].
I'll leave any new changes to the thread on this point.

Attached the updated patch.
Again, I used one basic patch in another thread to wake up logical replication 
worker
shared in [2] for TAP tests.

[1] - 
https://www.postgresql.org/message-id/tyapr01mb586668e50fc2447ad7f92491f5...@tyapr01mb5866.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/flat/20221122004119.GA132961%40nathanxps13


Best Regards,
Takamichi Osumi



v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch
Description:  v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch


v11-0002-Time-delayed-logical-replication-subscriber.patch
Description:  v11-0002-Time-delayed-logical-replication-subscriber.patch


RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-21 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> > > Another related point to consider is what is the behavior of
> > > synchronous replication when shutdown has been performed both in the
> > > case of physical and logical replication especially when the
> > > time-delayed replication feature is enabled?
> >
> > In physical replication without any failures, it seems that users can stop 
> > primary
> > server even if the applications are delaying on secondary. This is because 
> > sent
> WALs
> > are immediately flushed on secondary and walreceiver replies its position.
> >
> 
> What happens when synchronous_commit's value is remote_apply and the
> user has also set synchronous_standby_names to corresponding standby?

Even if synchronous_commit is set to remote_apply, the primary server can be
shut down. The reason why walsender can exit is that it does not care about the
status whether WALs are "applied" or not. It just checks the "flushed" WAL
position, not applied one.

I think we should start another thread about changing the shut-down condition,
so forked[1].

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB586668E50FC2447AD7F92491F5E89%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Exit walsender before confirming remote flush in logical replication

2022-12-21 Thread Hayato Kuroda (Fujitsu)
Dear hackers,
(I added Amit as CC because we discussed in another thread)

This is a fork thread from time-delayed logical replication [1].
While discussing, we thought that we could extend the condition of walsender 
shutdown[2][3].

Currently, walsenders delay the shutdown request until confirming all sent data
are flushed on remote side. This condition was added in 985bd7[4], which is for
supporting clean switchover. Supposing that there is a primary-secondary
physical replication system, and do following steps. If any changes are come
while step 2 but the walsender does not confirm the remote flush, the reboot in
step 3 may be failed.

1. Stops primary server.
2. Promotes secondary to new primary.
3. Reboot (old)primary as new secondary.

In case of logical replication, however, we cannot support the use-case that
switches the role publisher <-> subscriber. Suppose same case as above, 
additional
transactions are committed while doing step2. To catch up such changes 
subscriber
must receive WALs related with trans, but it cannot be done because subscriber
cannot request WALs from the specific position. In the case, we must truncate 
all
data in new subscriber once, and then create new subscription with copy_data
= true.

Therefore, I think that we can ignore the condition for shutting down the
walsender in logical replication.

This change may be useful for time-delayed logical replication. The walsender
waits the shutdown until all changes are applied on subscriber, even if it is
delayed. This causes that publisher cannot be stopped if large delay-time is
specified.

PSA the minimal patch for that. I'm not sure whether WalSndCaughtUp should be
also omitted or not. It seems that changes may affect other parts like
WalSndWaitForWal(), but we can investigate more about it.

[1]: https://commitfest.postgresql.org/41/3581/
[2]: 
https://www.postgresql.org/message-id/TYAPR01MB58661BA3BF38E9798E59AE14F5E19%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: 
https://www.postgresql.org/message-id/CAA4eK1LyetktcphdRrufHac4t5DGyhsS2xG2DSOGb7OaOVcDVg%40mail.gmail.com
[4]: 
https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Exit-walsender-before-confirming-remote-flush-in-log.patch
Description:  0001-Exit-walsender-before-confirming-remote-flush-in-log.patch


Re: psql tab-complete

2022-12-21 Thread Thomas Munro
On Thu, Dec 22, 2022 at 3:25 PM Michael Paquier  wrote:
> On Wed, Dec 21, 2022 at 05:56:05PM -0500, Tom Lane wrote:
> > Thomas Munro  writes:
> >> Since I really want to be able to use VA_ARGS_NARGS() elsewhere,
> >
> > +1, that'd be handy.
> >
> >> ... Assuming those switches actually work as claimed, I see
> >> two choices: commit this hack with a comment reminding us to clean it
> >> up later, or drop 2015.
> >
> > As long as we can hide the messiness inside a macro definition,
> > I'd vote for the former.
>
> Agreed, even if it is worth noting that all the buildfarm members
> with MSVC use 2017 or newer.

Thanks.  Pushed.

PS is it a mistake that we still mention SDK 8.1 in the docs?




Re: Simplifications for error messages related to compression

2022-12-21 Thread Michael Paquier
On Tue, Dec 20, 2022 at 11:12:22PM -0600, Justin Pryzby wrote:
> Yes, and its current users (basebackup) output a gzip file, right ?
> 
> pg_dump -Fc doesn't output a gzip file, but now it's using user-facing
> compression specifications referring to it as "gzip".

Not all of them are compressed either, like the base TOC file.

> If you tell someone they can write -Z gzip, they'll be justified in
> expecting to see "gzip" as output.

That's the point where my interpretation is different than yours,
where I don't really see as an issue that we do not generate a gzip
file all the time in the output.  Honestly, I am not sure that there
is anything to win here by not using the same option interface for all
the binaries or have tweaks to make pg_dump cope with that (like using
zlib as an extra alias).  The custom, directory and tar formats of
pg_dumps have their own idea of the files to compress or not (like
the base TOC file is never compressed so as one can do a pg_restore
-l).
--
Michael


signature.asc
Description: PGP signature


Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-21 Thread Bharath Rupireddy
On Thu, Dec 22, 2022 at 7:57 AM Michael Paquier  wrote:
>
> On Wed, Dec 21, 2022 at 10:22:02PM +0100, Magnus Hagander wrote:
> > Basically, we take one thing and turn it into 3. That very naturally rings
> > with "split" to me.
> >
> > Parse might work as well, certainly better than dissect. I'd still prefer
> > split though.
>
> Honestly, I don't have any counter-arguments, so I am fine to switch
> the name as you are suggesting.  And pg_split_walfile_name() it is?

+1. FWIW, a simple patch is here
https://www.postgresql.org/message-id/CALj2ACXdZ7WGRD-_jPPeZugvWLN%2Bgxo3QtV-eZPRicUwjesM%3Dg%40mail.gmail.com.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Optimization issue of branching UNION ALL

2022-12-21 Thread Tom Lane
I wrote:
> Richard Guo  writes:
>> I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
>> AppendRelInfos in pull_up_simple_subquery.  I'm not sure why we have
>> this check.  It seems not necessary.

> Yeah, I was wondering about that too ... maybe it was important
> in some previous state of the code?  I didn't do any archeology
> though.

After a bit of "git blame"-ing, it appears that that hasSubLinks
check was introduced in e006a24ad, which added a FlattenedSubLink
node type and needed to fix them up here:

+* We also have to fix the relid sets of any FlattenedSubLink nodes in
+* the parent query.  (This could perhaps be done by ResolveNew, but it

Then when I got rid of FlattenedSubLink in e549722a8, I neglected
to remove that check.  So I think maybe we don't need it, but I've
not tested.

regards, tom lane




Re: Optimization issue of branching UNION ALL

2022-12-21 Thread Tom Lane
Richard Guo  writes:
> I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
> AppendRelInfos in pull_up_simple_subquery.  I'm not sure why we have
> this check.  It seems not necessary.

Yeah, I was wondering about that too ... maybe it was important
in some previous state of the code?  I didn't do any archeology
though.

> In remove_result_refs, I don't think we need to check 'lastPHId' again
> before calling substitute_phv_relids, since it has been checked a few
> lines earlier.

Oh, duh ...

regards, tom lane




Re: Inconsistency in reporting checkpointer stats

2022-12-21 Thread Kyotaro Horiguchi
At Wed, 21 Dec 2022 17:14:12 +0530, Nitin Jadhav 
 wrote in 
> [1]:
> 2022-12-21 10:52:25.931 UTC [63530] LOG:  checkpoint complete: wrote
> 4670 buffers (28.5%), wrote 3 slru buffers (0.0%); 0 WAL file(s)
> added, 0 removed, 4 recycled; write=0.045 s, sync=0.161 s, total=0.244
> s; sync files=25, longest=0.146 s, average=0.007 s; distance=66130 kB,
> estimate=66130 kB; lsn=0/5557C78, redo lsn=0/5557C40
> 
> Thanks & Regards,
> Nitin Jadhav
> 
> On Tue, Dec 20, 2022 at 11:08 PM Andres Freund  wrote:
> >
> > On 2022-12-20 08:18:36 -0500, Robert Haas wrote:
> > > I think that the SLRU information is potentially useful, but mixing it
> > > with the information about regular buffers just seems confusing.
> >
> > +1
> >
> > At least for now, it'd be different if/when we manage to move SLRUs to
> > the main buffer pool.

It sounds reasonable to exclude SRLU write from buffer writes. But I'm
not sure its useful to count SLRU writes separately since it is under
the noise level of buffer writes in reglular cases and the value
doesn't lead to tuning. However I'm not strongly opposed to adding it
either.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] random_normal function

2022-12-21 Thread Michael Paquier
On Wed, Dec 21, 2022 at 08:47:32AM +0100, Fabien COELHO wrote:
> From a typical use case point of view, I'd say uniform, normal and
> exponential would make sense for floats. I'm also okay with generating a
> uniform bytes pseudo-randomly.

I'd agree with this set.

> I'd be more at ease to add simple functions rather than a special
> heavy-on-keywords syntax, even if standard.

Okay.

>> Note that SQLValueFunction made the addition of more returning data
>> types a bit more complicated (not much, still) than the new
>> COERCE_SQL_SYNTAX by going through a mapping function, so the
>> keyword/function mapping is straight-forward.
> 
> I'm unclear about why this paragraph is here.

Just saying that using COERCE_SQL_SYNTAX for SQL keywords is easier
than the older style.  If the SQL specification mentions no SQL
keywords for such things, this is irrelevant, of course :)
--
Michael


signature.asc
Description: PGP signature


Re: Optimization issue of branching UNION ALL

2022-12-21 Thread Richard Guo
On Thu, Dec 22, 2022 at 9:50 AM Tom Lane  wrote:

> Andrey Lepikhov  writes:
> > Superficial study revealed possibly unnecessary operations that could be
> > avoided:
> > 1. Walking across a query by calling substitute_phv_relids() even if
> > lastPHId shows that no one phv is presented.
>
> Yeah, we could do that, and it'd help some.


I noticed we also check 'parse->hasSubLinks' when we fix PHVs and
AppendRelInfos in pull_up_simple_subquery.  I'm not sure why we have
this check.  It seems not necessary.

In remove_result_refs, I don't think we need to check 'lastPHId' again
before calling substitute_phv_relids, since it has been checked a few
lines earlier.

Thanks
Richard


Re: Inconsistency in reporting checkpointer stats

2022-12-21 Thread Kyotaro Horiguchi
At Mon, 19 Dec 2022 18:05:38 +0530, Bharath Rupireddy 
 wrote in 
> On Fri, Dec 16, 2022 at 2:14 PM Kyotaro Horiguchi
>  wrote:
> >
> > In the first place I don't like that we count the same things twice.
> > Couldn't we count the number only by any one of them?
> >
> > If we remove CheckPointerStats.ckpt_bufs_written, CreateCheckPoint can
> > get the final number as the difference between the start-end values of
> > *the shared stats*. As long as a checkpoint runs on a single process,
> > trace info in BufferSync will work fine.  Assuming single process
> > checkpointing there must be no problem to do that. (Anyway the current
> > shared stats update for checkpointer is assuming single-process).
> 
> What if someone resets checkpointer shared stats with
> pg_stat_reset_shared()? In such a case, the checkpoint complete
> message will not have the stats, no?

I don't know. I don't believe the stats system doesn't follow such a
strict resetting policy.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-21 Thread Michael Paquier
On Wed, Dec 21, 2022 at 10:22:02PM +0100, Magnus Hagander wrote:
> Basically, we take one thing and turn it into 3. That very naturally rings
> with "split" to me.
> 
> Parse might work as well, certainly better than dissect. I'd still prefer
> split though.

Honestly, I don't have any counter-arguments, so I am fine to switch
the name as you are suggesting.  And pg_split_walfile_name() it is?
--
Michael


signature.asc
Description: PGP signature


Re: psql tab-complete

2022-12-21 Thread Michael Paquier
On Wed, Dec 21, 2022 at 05:56:05PM -0500, Tom Lane wrote:
> Thomas Munro  writes:
>> Since I really want to be able to use VA_ARGS_NARGS() elsewhere,
> 
> +1, that'd be handy.
> 
>> ... Assuming those switches actually work as claimed, I see
>> two choices: commit this hack with a comment reminding us to clean it
>> up later, or drop 2015.
> 
> As long as we can hide the messiness inside a macro definition,
> I'd vote for the former.

Agreed, even if it is worth noting that all the buildfarm members
with MSVC use 2017 or newer.
--
Michael


signature.asc
Description: PGP signature


Re: Direct I/O

2022-12-21 Thread Thomas Munro
On Wed, Dec 14, 2022 at 5:48 PM Thomas Munro  wrote:
> 0001 -- David's palloc_aligned() patch 
> https://commitfest.postgresql.org/41/3999/
> 0002 -- I/O-align almost all buffers used for I/O
> 0003 -- Add the GUCs
> 0004 -- Throwaway hack to make cfbot turn the GUCs on

David pushed the first as commit 439f6175, so here is a rebase of the
rest.  I also fixed a couple of thinkos in the handling of systems
where we don't know how to do direct I/O.  In one place I had #ifdef
PG_O_DIRECT, but that's always defined, it's just that it's 0 on
Solaris and OpenBSD, and the check to reject the GUC wasn't quite
right on such systems.
From f6adf05ffa5bdf43cd3ca2bcc4dba39d1474ce09 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 13 Dec 2022 16:25:59 +1300
Subject: [PATCH v3 1/3] Introduce PG_IO_ALIGN_SIZE and align all I/O buffers.

In order to be allowed to use O_DIRECT in a later commit, we need to
align buffers to the virtual memory page size.  O_DIRECT would either
fail to work or fail to work efficiently without that on various
platforms.  Even without O_DIRECT, aligning on memory pages improves
traditional buffered I/O performance.

The alignment size is set to 4096, which is enough for currently known
systems.  There is no standard governing the requirements for O_DIRECT so
it's possible we might have to reconsider this approach or fail to work
on some exotic system, but for now this simplistic approach works and
it can be changed at compile time.

Adjust all call sites that allocate heap memory for file I/O to use the
new palloc_aligned() or MemoryContextAllocAligned() functions.  For
stack-allocated buffers, introduce PGIOAlignedBlock to respect
PG_IO_ALIGN_SIZE, if possible with this compiler.  Also align the main
buffer pool in shared memory.

If arbitrary alignment of stack objects is not possible with this
compiler, then completely disable the use of O_DIRECT by setting
PG_O_DIRECT to 0.  (This avoids the need to consider systems that have
O_DIRECT but don't have a compiler with an extension that can align
stack objects the way we want; that could be done but we don't currently
know of any such system, so it's easier to pretend there is no O_DIRECT
support instead: that's an existing and tested class of system.)

Add assertions that all buffers passed into smgrread(), smgrwrite(),
smgrextend() are correctly aligned, if PG_O_DIRECT isn't 0.

Author: Thomas Munro 
Author: Andres Freund 
Reviewed-by: Justin Pryzby 
Discussion: https://postgr.es/m/ca+hukgk1x532hyqj_mzfwt0n1zt8trz980d79wbjwnt-yyl...@mail.gmail.com
---
 contrib/bloom/blinsert.c  |  2 +-
 contrib/pg_prewarm/pg_prewarm.c   |  2 +-
 src/backend/access/gist/gistbuild.c   |  9 +++---
 src/backend/access/hash/hashpage.c|  2 +-
 src/backend/access/heap/rewriteheap.c |  2 +-
 src/backend/access/heap/visibilitymap.c   |  2 +-
 src/backend/access/nbtree/nbtree.c|  2 +-
 src/backend/access/nbtree/nbtsort.c   |  8 --
 src/backend/access/spgist/spginsert.c |  2 +-
 src/backend/access/transam/generic_xlog.c | 13 ++---
 src/backend/access/transam/xlog.c |  9 +++---
 src/backend/catalog/storage.c |  2 +-
 src/backend/storage/buffer/buf_init.c | 10 +--
 src/backend/storage/buffer/bufmgr.c   |  2 +-
 src/backend/storage/buffer/localbuf.c |  7 +++--
 src/backend/storage/file/buffile.c|  6 
 src/backend/storage/freespace/freespace.c |  2 +-
 src/backend/storage/page/bufpage.c|  5 +++-
 src/backend/storage/smgr/md.c | 15 +-
 src/backend/utils/sort/logtape.c  |  2 +-
 src/bin/pg_checksums/pg_checksums.c   |  2 +-
 src/bin/pg_rewind/local_source.c  |  4 +--
 src/bin/pg_upgrade/file.c |  4 +--
 src/common/file_utils.c   |  2 +-
 src/include/c.h   | 34 +--
 src/include/pg_config_manual.h|  7 +
 src/include/storage/fd.h  |  5 ++--
 src/tools/pgindent/typedefs.list  |  1 +
 28 files changed, 114 insertions(+), 49 deletions(-)

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index dd26d6ac29..53cc617a66 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -166,7 +166,7 @@ blbuildempty(Relation index)
 	Page		metapage;
 
 	/* Construct metapage. */
-	metapage = (Page) palloc(BLCKSZ);
+	metapage = (Page) palloc_aligned(BLCKSZ, PG_IO_ALIGN_SIZE, 0);
 	BloomFillMetapage(index, metapage);
 
 	/*
diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index caff5c4a80..f50aa69eb2 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -36,7 +36,7 @@ typedef enum
 	PREWARM_BUFFER
 } PrewarmType;
 
-static PGAlignedBlock blockbuffer;
+static PGIOAlignedBlock blockbuffer;
 
 /*
  * pg_prewarm(regclass, mode text, fork text,
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 

Re: Optimization issue of branching UNION ALL

2022-12-21 Thread Tom Lane
Andrey Lepikhov  writes:
> Complaint is about auto-generated query with 1E4 simple union all's (see 
> t.sh to generate a demo script). The reason: in REL_11_STABLE it is 
> planned and executed in a second, but REL_12_STABLE and beyond makes 
> matters worse: planning of such a query needs tons of gigabytes of RAM.

v11 (and prior versions) sucks just as badly.  In this example it
accidentally escapes trouble because it doesn't know how to pull up
a subquery with empty FROM.  But if you make the query look like

SELECT 1,1 FROM dual
  UNION ALL
SELECT 2,2 FROM dual
  UNION ALL
SELECT 3,3 FROM dual
...

then v11 chokes as well.  (Seems like we've overlooked the need
for check_stack_depth() and CHECK_FOR_INTERRUPTS() here ...)

> Superficial study revealed possibly unnecessary operations that could be 
> avoided:
> 1. Walking across a query by calling substitute_phv_relids() even if 
> lastPHId shows that no one phv is presented.

Yeah, we could do that, and it'd help some.

> 2. Iterative passes along the append_rel_list for replacing vars in the 
> translated_vars field. I can't grasp real necessity of passing all the 
> append_rel_list during flattening of an union all leaf subquery. No one 
> can reference this leaf, isn't it?

After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted.  So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.

regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index c2239d18b4..cb2c1ef5e0 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -27,6 +27,7 @@
 
 #include "catalog/pg_type.h"
 #include "funcapi.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/multibitmapset.h"
 #include "nodes/nodeFuncs.h"
@@ -127,7 +128,7 @@ static bool find_dependent_phvs_in_jointree(PlannerInfo *root,
 			Node *node, int varno);
 static void substitute_phv_relids(Node *node,
   int varno, Relids subrelids);
-static void fix_append_rel_relids(List *append_rel_list, int varno,
+static void fix_append_rel_relids(PlannerInfo *root, int varno,
   Relids subrelids);
 static Node *find_jointree_node_for_rel(Node *jtnode, int relid);
 
@@ -805,6 +806,11 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
 		   JoinExpr *lowest_nulling_outer_join,
 		   AppendRelInfo *containing_appendrel)
 {
+	/* Since this function recurses, it could be driven to stack overflow. */
+	check_stack_depth();
+	/* Also, since it's a bit expensive, let's check for query cancel. */
+	CHECK_FOR_INTERRUPTS();
+
 	Assert(jtnode != NULL);
 	if (IsA(jtnode, RangeTblRef))
 	{
@@ -1229,8 +1235,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 		Relids		subrelids;
 
 		subrelids = get_relids_in_jointree((Node *) subquery->jointree, false);
-		substitute_phv_relids((Node *) parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) parse, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -1409,7 +1416,10 @@ pull_up_union_leaf_queries(Node *setOp, PlannerInfo *root, int parentRTindex,
 
 		/*
 		 * Recursively apply pull_up_subqueries to the new child RTE.  (We
-		 * must build the AppendRelInfo first, because this will modify it.)
+		 * must build the AppendRelInfo first, because this will modify it;
+		 * indeed, that's the only part of the upper query where Vars
+		 * referencing childRTindex can exist at this point.)
+		 *
 		 * Note that we can pass NULL for containing-join info even if we're
 		 * actually under an outer join, because the child's expressions
 		 * aren't going to propagate up to the join.  Also, we ignore the
@@ -2107,6 +2117,25 @@ perform_pullup_replace_vars(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	ListCell   *lc;
 
+	/*
+	 * If we are considering an appendrel child subquery (that is, a UNION ALL
+	 * member query that we're pulling up), then the only part of the upper
+	 * query that could reference the child yet is the translated_vars list of
+	 * the containing appendrel.  Furthermore, we do not need to insert PHVs
+	 * in the appendrel --- there isn't any outer join between.
+	 */
+	if (containing_appendrel)
+	{
+		bool		save_need_phvs = rvcontext->need_phvs;
+
+		rvcontext->need_phvs = false;
+		containing_appendrel->translated_vars = (List *)
+			pullup_replace_vars((Node *) containing_appendrel->translated_vars,
+rvcontext);
+		rvcontext->need_phvs = save_need_phvs;
+		return;
+	}
+
 	/*
 	 * Replace all of the top query's references to the subquery's outputs
 	 * with copies of the adjusted subtlist items, being careful 

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-21 Thread Peter Geoghegan
On Wed, Dec 21, 2022 at 4:30 PM Jeff Davis  wrote:
> The confusing thing to me is perhaps just the name -- to me,
> "freeze_required" suggests that if it were set to true, it would cause
> freezing to happen. But as far as I can tell, it does not cause
> freezing to happen, it causes some other things to happen that are
> necessary when freezing happens (updating and using the right
> trackers).

freeze_required is about what's required, which tells us nothing about
what will happen when it's not required (could go either way,
depending on how lazy_scan_prune feels about it).

Setting freeze_required=true implies that heap_prepare_freeze_tuple
has stopped doing maintenance of the "no freeze" trackers. When it
sets freeze_required=true, it really *does* force freezing to happen,
in every practical sense. This happens because lazy_scan_prune does
what it's told to do when it's told that freezing is required. Because
of course it does, why wouldn't it?

So...I still don't get what you mean. Why would lazy_scan_prune ever
break its contract with heap_prepare_freeze_tuple? And in what sense
would you say that heap_prepare_freeze_tuple's setting
freeze_required=true doesn't quite amount to "forcing freezing"? Are
you worried about the possibility that lazy_scan_prune will decide to
rebel at some point, and fail to honor its contract with
heap_prepare_freeze_tuple?  :-)

> A minor point, no need to take action here. Perhaps rename the
> variable.

Andres was the one that suggested this name, actually. I initially
just called it "freeze", but I think that Andres had it right.

> I think 0001+0002 are about ready.

Great. I plan on committing 0001 in the next few days. Committing 0002
might take a bit longer.

Thanks
-- 
Peter Geoghegan




Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

2022-12-21 Thread David Rowley
On Wed, 16 Nov 2022 at 23:56, David Rowley  wrote:
> I've attached an updated patch.  The 0002 is just intended to exercise
> these allocations a little bit, it's not intended for commit. I was
> using that to ensure valgrind does not complain about anything.  It
> seems happy now.

After making some comment adjustments and having adjusted the
calculation of how to get the old chunk size when doing repalloc() on
an aligned chunk, I've now pushed this.

Thank you for the reviews.

David




Re: New strategies for freezing, advancing relfrozenxid early

2022-12-21 Thread Jeff Davis
On Tue, 2022-12-20 at 21:26 -0800, Peter Geoghegan wrote:
> When freeze_required is set to true, that means that lazy_scan_prune
> literally has no choice -- it simply must freeze the page as
> instructed by heap_prepare_freeze_tuple/FreezeMultiXactId. It's not
> just a strong suggestion -- it's crucial that lazy_scan_prune freezes
> the page as instructed.

The confusing thing to me is perhaps just the name -- to me,
"freeze_required" suggests that if it were set to true, it would cause
freezing to happen. But as far as I can tell, it does not cause
freezing to happen, it causes some other things to happen that are
necessary when freezing happens (updating and using the right
trackers).

A minor point, no need to take action here. Perhaps rename the
variable.

I think 0001+0002 are about ready.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-21 Thread Ranier Vilela
Thanks David, for looking at this.

Em ter., 20 de dez. de 2022 às 22:45, David Rowley 
escreveu:

> On Wed, 21 Dec 2022 at 13:15, Ranier Vilela  wrote:
> > IMO, I think that commit a61b1f7, has an oversight.
> > Currently is losing the result of recursion of function
> translate_col_privs_multilevel.
> >
> > Once the variable result (Bitmapset pointer) is reassigned.
> >
> > Without a test case for this patch.
> > But also, do not have a test case for the current thinko in head.
>
> hmm, that code looks a bit suspect to me too.
>
> Are you able to write a test that shows the bug which fails before
> your change and passes after applying it? I don't think it's quite
> enough to claim that your changes pass make check given that didn't
> fail before your change.
>
No, unfortunately not yet. Of course that test case would be very nice.
But my time for postgres is very limited.
For voluntary work, without any payment, I think what I have contributed is
good.


> Also, I think it might be better to take the opportunity to rewrite
> the function to not use recursion. I don't quite see the need for it
> here and it looks like that might have helped contribute to the
> reported issue.  Can't we just write this as a while loop instead of
> having the function call itself? It's not as if we need stack space
> for keeping track of multiple parents. A child relation can only have
> 1 parent. It seems to me that we can just walk there by looping.
>
 I took a look at the code that deals with the array (append_rel_array) and
all the loops seem different from each other and out of any pattern.
Unfortunately, I still can't get this loop to work correctly,
I need to learn more about Postgres structures and the correct way to
process them.
If you can do it, I'd be happy to learn the right way.

regards,
Ranier Vilela


Re: Error-safe user functions

2022-12-21 Thread Andrew Dunstan

On 2022-12-18 Su 09:42, Andrew Dunstan wrote:
> On 2022-12-14 We 17:37, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Thanks, I have been looking at jsonpath, but I'm not quite sure how to
>>> get the escontext argument to the yyerror calls in jsonath_scan.l. Maybe
>>> I need to specify a lex-param setting?
>> You want a parse-param option in jsonpath_gram.y, I think; adding that
>> will persuade Bison to change the signatures of relevant functions.
>> Compare the mods I made in contrib/cube in ccff2d20e.
>>
>>  
>
> Yeah, I started there, but it's substantially more complex - unlike cube
> the jsonpath scanner calls the error routines as well as the parser.
>
>
> Anyway, here's a patch.
>
>

And here's another for contrib/seg

I'm planning to commit these two in the next day or so.


cheers


andew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From d7a0d93ee90ccc61b7179f236dcd1a824f30c8e6 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Wed, 21 Dec 2022 18:14:03 -0500
Subject: [PATCH] Provide error safety for contrib/seg's input function

---
 contrib/seg/expected/seg.out | 20 
 contrib/seg/seg.c|  4 ++--
 contrib/seg/segdata.h|  5 +++--
 contrib/seg/segparse.y   | 34 +++---
 contrib/seg/segscan.l| 10 +++---
 contrib/seg/sql/seg.sql  | 13 +
 6 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/contrib/seg/expected/seg.out b/contrib/seg/expected/seg.out
index 2320464dd4..7a06113ed8 100644
--- a/contrib/seg/expected/seg.out
+++ b/contrib/seg/expected/seg.out
@@ -1273,3 +1273,23 @@ FROM test_seg WHERE s @> '11.2..11.3' OR s IS NULL ORDER BY s;
||  
 (144 rows)
 
+-- test non error throwing API
+SELECT str as seg,
+   pg_input_is_valid(str,'seg') as ok,
+   pg_input_error_message(str,'seg') as errmsg
+FROM unnest(ARRAY['-1 .. 1'::text,
+  '100(+-)1',
+  '',
+  'ABC',
+  '1 e7',
+  '1e700']) str;
+   seg| ok |errmsg 
+--++---
+ -1 .. 1  | t  | 
+ 100(+-)1 | t  | 
+  | f  | bad seg representation
+ ABC  | f  | bad seg representation
+ 1 e7 | f  | bad seg representation
+ 1e700| f  | "1e700" is out of range for type real
+(6 rows)
+
diff --git a/contrib/seg/seg.c b/contrib/seg/seg.c
index a7effc1b19..7f9fc24eb4 100644
--- a/contrib/seg/seg.c
+++ b/contrib/seg/seg.c
@@ -108,8 +108,8 @@ seg_in(PG_FUNCTION_ARGS)
 
 	seg_scanner_init(str);
 
-	if (seg_yyparse(result) != 0)
-		seg_yyerror(result, "bogus input");
+	if (seg_yyparse(result, fcinfo->context) != 0)
+		seg_yyerror(result, fcinfo->context, "bogus input");
 
 	seg_scanner_finish();
 
diff --git a/contrib/seg/segdata.h b/contrib/seg/segdata.h
index f4eafc865d..3d6e3e3f24 100644
--- a/contrib/seg/segdata.h
+++ b/contrib/seg/segdata.h
@@ -16,9 +16,10 @@ extern int	significant_digits(const char *s);
 
 /* in segscan.l */
 extern int	seg_yylex(void);
-extern void seg_yyerror(SEG *result, const char *message) pg_attribute_noreturn();
+extern void seg_yyerror(SEG *result, struct Node *escontext,
+		const char *message);
 extern void seg_scanner_init(const char *str);
 extern void seg_scanner_finish(void);
 
 /* in segparse.y */
-extern int	seg_yyparse(SEG *result);
+extern int	seg_yyparse(SEG *result, struct Node *escontext);
diff --git a/contrib/seg/segparse.y b/contrib/seg/segparse.y
index 0156c3e027..19f1dba50f 100644
--- a/contrib/seg/segparse.y
+++ b/contrib/seg/segparse.y
@@ -7,7 +7,9 @@
 #include 
 
 #include "fmgr.h"
+#include "nodes/miscnodes.h"
 #include "utils/builtins.h"
+#include "utils/float.h"
 
 #include "segdata.h"
 
@@ -19,7 +21,7 @@
 #define YYMALLOC palloc
 #define YYFREE   pfree
 
-static float seg_atof(const char *value);
+static bool seg_atof(char *value, float *result, struct Node *escontext);
 
 static int sig_digits(const char *value);
 
@@ -35,6 +37,7 @@ static char strbuf[25] = {
 
 /* BISON Declarations */
 %parse-param {SEG *result}
+%parse-param {struct Node *escontext}
 %expect 0
 %name-prefix="seg_yy"
 
@@ -77,7 +80,7 @@ range: boundary PLUMIN deviation
 		result->lower = $1.val;
 		result->upper = $3.val;
 		if ( result->lower > result->upper ) {
-			ereport(ERROR,
+			errsave(escontext,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 	 errmsg("swapped boundaries: %g is greater than %g",
 			result->lower, result->upper)));
@@ -121,7 +124,10 @@ range: boundary PLUMIN deviation
 boundary: SEGFLOAT
 	{
 		/* temp variable avoids a gcc 3.3.x bug on Sparc64 */
-		float		val = seg_atof($1);
+		float		val;
+
+		if (!seg_atof($1, , escontext))
+			YYABORT;
 
 		$$.ext = '\0';
 		$$.sigd = sig_digits($1);
@@ -130,7 +136,10 @@ boundary: SEGFLOAT
 	| EXTENSION SEGFLOAT
 	{
 		/* temp variable avoids a gcc 3.3.x bug on Sparc64 */
-		float		val = seg_atof($2);
+		

Re: psql tab-complete

2022-12-21 Thread Tom Lane
Thomas Munro  writes:
> Since I really want to be able to use VA_ARGS_NARGS() elsewhere,

+1, that'd be handy.

> ... Assuming those switches actually work as claimed, I see
> two choices: commit this hack with a comment reminding us to clean it
> up later, or drop 2015.

As long as we can hide the messiness inside a macro definition,
I'd vote for the former.

regards, tom lane




Re: psql tab-complete

2022-12-21 Thread Thomas Munro
On Sat, Oct 26, 2019 at 4:59 PM Michael Paquier  wrote:
> On Fri, Oct 25, 2019 at 11:57:18AM +0300, Victor Spirin wrote:
> > This patch resolved one problem in the tab-complete.c on MSVC. The
> > VA_ARGS_NARGS macros now work correctly on Windows.
>
> Can you explain why and in what the use of EXPAND() helps with MSVC
> builds?  Any references which help to understand why this is better?
> If this change is needed, this also surely needs a comment to explain
> the difference.

Since I really want to be able to use VA_ARGS_NARGS() elsewhere, I
looked into this.  There are various derivatives of that macro, some
using GCC/Clang-only syntax and that work on GCC and MSVC, splattered
all over the internet, but the original, coming as it does from a C
standards newsgroup[1], does not.  There are also lots of complaints
that the original standard version doesn't work on MSVC, with
analysis:

https://stackoverflow.com/questions/5134523/msvc-doesnt-expand-va-args-correctly
https://stackoverflow.com/questions/32399191/va-args-expansion-using-msvc
https://learn.microsoft.com/en-us/cpp/build/reference/zc-preprocessor?view=msvc-170

The short version is that __VA_ARGS__ is not tokenized the way the
standard requires (it's considered to be a single token unless you
shove it back through the preprocessor again, which is what EXPAND()
does), but you can fix that with /Zc:preprocessor.  That switch only
works in Visual Studio 2019 and up, and maybe also 2017 if you spell
it /experimental:preprocessor.  We still claim to support older
compilers.  Assuming those switches actually work as claimed, I see
two choices: commit this hack with a comment reminding us to clean it
up later, or drop 2015.

[1] https://groups.google.com/g/comp.std.c/c/d-6Mj5Lko_s




Re: float4in_internal

2022-12-21 Thread Andrew Dunstan


On 2022-12-21 We 10:33, Tom Lane wrote:
> Andrew Dunstan  writes:
>> The attached patch factors out the guts of float4in so that the new
>> float4in_internal function is callable without going through the fmgr
>> call sequence. This will make adjusting the seg module's input function
>> to handle soft errors simpler. A similar operation was done for float8in
>> some years ago in commit 50861cd683e. The new function has an identical
>> argument structure to float8in_internal.
> Looks reasonable except for one nitpick: the "out of range" message
> in the ERANGE case should be kept mentioning real, not the passed
> type_name, to be equivalent to the way float8in_internal does it.
> I lack enough caffeine to recall exactly why float8in_internal
> does it that way, but the comments are exceedingly clear that it was
> intentional, and I'm sure the same rationale would apply here.
>
> (float8in_internal also goes out of its way to show just the part of
> the string that is the number in that case, but I'm willing to let
> that pass for now.)
>
>   


Thanks for reviewing.

I made both these changes, to keep the two functions more closely aligned.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





postgres_fdw planning issue: EquivalenceClass changes confuse it

2022-12-21 Thread Tom Lane
I discovered that if you do this:

diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2e6f7f4852..2ae231fd90 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -366,7 +366,7 @@ CREATE FUNCTION postgres_fdw_abs(int) RETURNS int AS $$
 BEGIN
 RETURN abs($1);
 END
-$$ LANGUAGE plpgsql IMMUTABLE;
+$$ LANGUAGE plpgsql IMMUTABLE STRICT;
 CREATE OPERATOR === (
 LEFTARG = int,
 RIGHTARG = int,

one of the plan changes that you get (attached) is that this query:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM local_tbl LEFT JOIN (SELECT ft1.* FROM ft1 INNER JOIN ft2 ON 
(ft1.c1 = ft2.c1 AND ft1.c1 < 100 AND ft1.c1 = postgres_fdw_abs(ft2.c2))) ss ON 
(local_tbl.c3 = ss.c3) ORDER BY local_tbl.c1 FOR UPDATE OF local_tbl;

can no longer do the join as a foreign join.  There are some other
changes that are more explicable, because strictness of the function
allows the planner to strength-reduce full joins to left joins, but
what happened here?

The answer is that once postgres_fdw_abs() is marked strict,
the EquivalenceClass machinery will group these clauses as an
EquivalenceClass:

ft1.c1 = ft2.c1 AND ft1.c1 = postgres_fdw_abs(ft2.c2)

which it will then choose to implement as a restriction clause
on ft2
ft2.c1 = postgres_fdw_abs(ft2.c2)
followed by a join clause
ft1.c1 = ft2.c1
This is a good and useful transformation, because it can get rid
of ft2 rows at the scan level instead of waiting for them to be
joined.  However, because we are treating postgres_fdw_abs() as
non-shippable in this particular test case, that means that ft2
now has a non-shippable restriction clause, causing foreign_join_ok
to give up here:

/*
 * If joining relations have local conditions, those conditions are
 * required to be applied before joining the relations. Hence the join can
 * not be pushed down.
 */
if (fpinfo_o->local_conds || fpinfo_i->local_conds)
return false;

In the other formulation, "ft1.c1 = postgres_fdw_abs(ft2.c2)"
is a non-shippable join clause, which foreign_join_ok knows how
to cope with.  So this seems like a fairly nasty asymmetry.

I ran into this while experimenting with the next phase in my
outer-join-vars patch set, in which the restriction that
below-outer-join Equivalence classes contain only strict members
will go away.  So that breaks this test, and I need to either
fix postgres_fdw or change the test case.

I experimented with teaching foreign_join_ok to pull up the child rels'
local_conds to be join local_conds if the join is an inner join,
which seems like a legal transformation.  I ran into a couple of
issues though, the hardest of which to solve is that in DML queries
we get "variable not found in subplan target lists" failures while
trying to build some EPQ queries.  That's because the pulled-up
condition uses a variable that we didn't think we'd need at the join
level.  That could possibly be fixed by handling these conditions
differently for the transmitted query than the EPQ query, but I'm
not sufficiently familiar with the postgres_fdw code to want to
take point on coding that.  In any case, this line of thought
would lead to several other plan changes in the postgres_fdw
regression tests, and I'm not sure if any of those would be
breaking the intent of the test cases.

Or I could just hack this one test so that it continues to
not be an EquivalenceClass case.

Thoughts?

regards, tom lane

diff -U3 /home/postgres/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out /home/postgres/pgsql/contrib/postgres_fdw/results/postgres_fdw.out
--- /home/postgres/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out	2022-12-21 16:28:13.601965240 -0500
+++ /home/postgres/pgsql/contrib/postgres_fdw/results/postgres_fdw.out	2022-12-21 16:30:20.152135336 -0500
@@ -923,7 +923,7 @@
 BEGIN
 RETURN abs($1);
 END
-$$ LANGUAGE plpgsql IMMUTABLE;
+$$ LANGUAGE plpgsql IMMUTABLE STRICT;
 CREATE OPERATOR === (
 LEFTARG = int,
 RIGHTARG = int,
@@ -1828,28 +1828,44 @@
 -- full outer join + WHERE clause with shippable extensions set
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t1.c3 FROM ft1 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE postgres_fdw_abs(t1.c1) > 0 OFFSET 10 LIMIT 10;
- QUERY PLAN 
-
- Foreign Scan
+QUERY PLAN
+--
+ Limit
   

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-21 Thread Magnus Hagander
On Wed, Dec 21, 2022 at 1:09 AM Michael Paquier  wrote:

> On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote:
> > On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander 
> wrote:
> >> Caught this thread late. To me, pg_dissect_walfile_name() is a
> >> really strange name for a function. Grepping our I code I see the
> >> term dissect s used somewhere inside the regex code and exactly
> >> zero instances elsewhere. Which is why I definitely didn't
> >> recognize the term...
> >>
> >> Wouldn't something like pg_split_walfile_name() be a lot more
> >> consistent with the rest of our names?
>
> Fine by me to change that if there is little support for the current
> naming, though the current one does not sound that bad to me either.
>
> > Hm. FWIW, here's the patch.
>
> "split" is used a lot for the picksplit functions, but not in any of
> the existing functions as a name.  Some extra options: parse, read,
> extract, calculate, deduce, get.  "parse" would be something I would
> be OK with.
>


Not sure what you mean? We certainly have a lot of functions called split
that are not the picksplit ones. split_part(). regexp_split_to_array(),
regexp_split_to_table()... And ther'es things like tuiple_data_split() in
pageinspect.

There are many other examples outside of postgres as well, e.g. python has
a split() of pathnames, "almost every language" has a split() on strings
etc. I don't think I've ever seen dissect in a place like that either
(though Im sure it exists somewhere, it's hardly common)

Basically, we take one thing and turn it into 3. That very naturally rings
with "split" to me.

Parse might work as well, certainly better than dissect. I'd still prefer
split though.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: assertion failures on BuildFarm that happened in slab.c

2022-12-21 Thread David Rowley
On Wed, 21 Dec 2022 at 16:28, Takamichi Osumi (Fujitsu)
 wrote:
> TRAP: failed Assert("dlist_is_empty(blocklist)"), File: "slab.c", Line: 564, 
> PID: 16148

> I'm not sure, but this could be related to a recent commit (d21ded75fd) in 
> [4].

It was. I've just pushed a fix.  Thanks for highlighting the problem.

David




Re: Array initialisation notation in syscache.c

2022-12-21 Thread Tom Lane
Nikita Malakhov  writes:
> Wanted to ask this since I encountered a need for a cache with 5 keys -
> why is the syscache index still limited to 4 keys?

Because there are no cases requiring 5, so far.

(A unique index with as many as 5 keys seems a bit fishy btw.)

regards, tom lane




Re: ExecRTCheckPerms() and many prunable partitions (checkAsUser)

2022-12-21 Thread Justin Pryzby
Alvaro could you comment on this ?




Re: dynamic result sets support in extended query protocol

2022-12-21 Thread Alvaro Herrera
On 2022-Nov-22, Peter Eisentraut wrote:

> I added tests using the new psql \bind command to test this functionality in
> the extended query protocol, which showed that this got broken since I first
> wrote this patch.  This "blame" is on the pipeline mode in libpq patch
> (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659).  I need to spend more time on
> this and figure out how to repair it.  In the meantime, here is an updated
> patch set with the current status.

I looked at this a little bit to understand why it fails with \bind.  As
you say, it does interact badly with pipeline mode -- more precisely, it
collides with the queue handling that was added for pipeline.  The
problem is that in extended query mode, we "advance" the queue in
PQgetResult when asyncStatus is READY -- fe-exec.c line 2110 ff.  But
the protocol relies on returning READY when the second RowDescriptor
message is received (fe-protocol3.c line 319), so libpq gets confused
and everything blows up.  libpq needs the queue to stay put until all
the results from that query have been consumed.

If you comment out the pqCommandQueueAdvance() in fe-exec.c line 2124,
your example works correctly and no longer throws a libpq error (but of
course, other things break).

I suppose that in order for this to work, we would have to find another
way to "advance" the queue that doesn't rely on the status being
PGASYNC_READY.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Array initialisation notation in syscache.c

2022-12-21 Thread Nikita Malakhov
Hi!

Wanted to ask this since I encountered a need for a cache with 5 keys -
why is the syscache index still limited to 4 keys?

Thanks!

On Wed, Dec 21, 2022 at 7:36 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 21.12.22 04:16, Thomas Munro wrote:
> > On Wed, Dec 21, 2022 at 1:33 PM Thomas Munro 
> wrote:
> >>  KEY(Anum_pg_attribute_attrelid,
> >>  Anum_pg_attribute_attnum),
> >
> > I independently rediscovered that our VA_ARGS_NARGS() macro in c.h
> > always returns 1 on MSVC via trial-by-CI.  Derp.  Here is the same
> > patch, no change from v2, but this time accompanied by Victor Spirin's
> > fix, which I found via one of the tab-completion-is-busted-on-Windows
> > discussions.  I can't supply a useful commit message, because I
> > haven't understood why it works, but it does indeed seem to work and
> > this should make cfbot green.
>
> This looks like a good improvement to me.
>
> (I have also thought about having this generated from the catalog
> definition files somehow, but one step at a time ...)
>
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Small miscellaneus fixes (Part II)

2022-12-21 Thread Ranier Vilela
Thanks for looking at this.

Em ter., 20 de dez. de 2022 às 21:51, Justin Pryzby 
escreveu:

> On Fri, Nov 25, 2022 at 06:27:04PM -0300, Ranier Vilela wrote:
> > 5. Use boolean operator with boolean operands
> > (b/src/backend/commands/tablecmds.c)
>
> tablecmds.c: right.  Since 074c5cfbf
>
> pg_dump.c: right.  Since b08dee24a
>
> > 4. Fix dead code (src/backend/utils/adt/formatting.c)
> > Np->sign == '+', is different than "!= '-'" and is different than "!=
> '+'"
> > So the else is never hit.
>
> formatting.c: I don't see the problem.
>
> if (Np->sign != '-')
> ...
> else if (Np->sign != '+' && IS_PLUS(Np->Num))
> ...
>
> You said that the "else" is never hit, but why ?
>
Maybe this part of the patch is wrong.
The only case for the first if not handled is sign == '-',
sign == '-' is handled by else.
So always the "else is true", because sign == '+' is
handled by the first if.

regards,
Ranier Vilela


Re: Get access to the whole query in CustomScan path callback

2022-12-21 Thread Tom Lane
Amin  writes:
> The goal is to have access to all the tables that are being scanned or will
> be scanned as a part of the query. Basically, the callback looks like this:

> typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
> RelOptInfo *rel,
> Index rti,
> RangeTblEntry *rte);

> Now, the problem is when there is a nested query, the function will be
> called once for the parent query and once for the subquery. However, I need
> access to the whole query in this function. There seems to be no CustomScan
> callback before this that has the whole query passed to it. Is there any
> way I can get access to the complete query (or all the relations in the
> query) by using the parameters passed to this function? Or any other
> workaround?

Everything the planner knows is accessible via the "root" pointer.

I very strongly question the idea that a custom scan provider should
be doing what you say you want to do, but the info is there.

regards, tom lane




Re: meson and tmp_install

2022-12-21 Thread Nikita Malakhov
Hi!

That's great, thanks! Discussion list is very long so I've missed this
topic.
Just a suggestion - I've checked the link above, maybe there should be
added a small part on where build files are located and how to add new
sources for successful build?

On Tue, Dec 20, 2022 at 9:22 PM Andres Freund  wrote:

> Hi,
>
> On 2022-12-20 21:11:26 +0300, Nikita Malakhov wrote:
> > Didn't know where to ask, so I've chosen this thread - there is no any
> > documentation on meson build platform in PostgreSQL docs.
>
> There is now:
> https://www.postgresql.org/docs/devel/install-meson.html
>
> Needs further work, but it's a start.
>
>
> > Is this okay? For me it was a surprise when the meson platform was
> > added
>
> It's been discussed on the list for a year or so before it was
> added. It's a large change, so unfortunately it's not something that I
> could get done in a single day, with perfect docs from the get go.
>
> Greetings,
>
> Andres Freund
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Get access to the whole query in CustomScan path callback

2022-12-21 Thread Amin
Hi,

The goal is to have access to all the tables that are being scanned or will
be scanned as a part of the query. Basically, the callback looks like this:

typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
RelOptInfo *rel,
Index rti,
RangeTblEntry *rte);

Now, the problem is when there is a nested query, the function will be
called once for the parent query and once for the subquery. However, I need
access to the whole query in this function. There seems to be no CustomScan
callback before this that has the whole query passed to it. Is there any
way I can get access to the complete query (or all the relations in the
query) by using the parameters passed to this function? Or any other
workaround?

Thank you and happy holidays!


Re: Array initialisation notation in syscache.c

2022-12-21 Thread Peter Eisentraut

On 21.12.22 04:16, Thomas Munro wrote:

On Wed, Dec 21, 2022 at 1:33 PM Thomas Munro  wrote:

 KEY(Anum_pg_attribute_attrelid,
 Anum_pg_attribute_attnum),


I independently rediscovered that our VA_ARGS_NARGS() macro in c.h
always returns 1 on MSVC via trial-by-CI.  Derp.  Here is the same
patch, no change from v2, but this time accompanied by Victor Spirin's
fix, which I found via one of the tab-completion-is-busted-on-Windows
discussions.  I can't supply a useful commit message, because I
haven't understood why it works, but it does indeed seem to work and
this should make cfbot green.


This looks like a good improvement to me.

(I have also thought about having this generated from the catalog 
definition files somehow, but one step at a time ...)






Re: float4in_internal

2022-12-21 Thread Tom Lane
Andrew Dunstan  writes:
> The attached patch factors out the guts of float4in so that the new
> float4in_internal function is callable without going through the fmgr
> call sequence. This will make adjusting the seg module's input function
> to handle soft errors simpler. A similar operation was done for float8in
> some years ago in commit 50861cd683e. The new function has an identical
> argument structure to float8in_internal.

Looks reasonable except for one nitpick: the "out of range" message
in the ERANGE case should be kept mentioning real, not the passed
type_name, to be equivalent to the way float8in_internal does it.
I lack enough caffeine to recall exactly why float8in_internal
does it that way, but the comments are exceedingly clear that it was
intentional, and I'm sure the same rationale would apply here.

(float8in_internal also goes out of its way to show just the part of
the string that is the number in that case, but I'm willing to let
that pass for now.)

regards, tom lane




Re: generic plans and "initial" pruning

2022-12-21 Thread Tom Lane
Alvaro Herrera  writes:
> This version of the patch looks not entirely unreasonable to me.  I'll
> set this as Ready for Committer in case David or Tom or someone else
> want to have a look and potentially commit it.

I will have a look during the January CF.

regards, tom lane




float4in_internal

2022-12-21 Thread Andrew Dunstan
The attached patch factors out the guts of float4in so that the new
float4in_internal function is callable without going through the fmgr
call sequence. This will make adjusting the seg module's input function
to handle soft errors simpler. A similar operation was done for float8in
some years ago in commit 50861cd683e. The new function has an identical
argument structure to float8in_internal.

We could probably call these two internal functions directly in
numeric_float4() and numeric_float8() - not sure if it's worth it rght
now but we might end up wanting something like that for error-safe casts.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From be571575e54c1b24617c54dbcd3daa3826d7e15a Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Wed, 21 Dec 2022 08:37:17 -0500
Subject: [PATCH] Introduce float4in_internal

This is the guts of float4in, callable as a routine to input floats,
which will be useful in an upcoming patch for allowing soft errors in
the seg module's input function.

A similar operation was performed some years ago for float8in in
commit 50861cd683e.
---
 src/backend/utils/adt/float.c | 52 ---
 src/include/utils/float.h |  3 ++
 2 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index b02a19be24..5a9b7fbe5b 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -163,17 +163,35 @@ Datum
 float4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
-	Node	   *escontext = fcinfo->context;
-	char	   *orig_num;
+
+	PG_RETURN_FLOAT4(float4in_internal(num, NULL, "real", num,
+	   fcinfo->context));
+}
+
+/*
+ * float4in_internal - guts of float4in()
+ *
+ * This is exposed for use by functions that want a reasonably
+ * platform-independent way of inputting floats. The behavior is
+ * essentially like strtof + ereturn on error.
+ *
+ * Uses the same API as float8in_internal below, so most of its
+ * comments also apply here, except regarding use in geometric types.
+ */
+float4
+float4in_internal(char *num, char **endptr_p,
+  const char *type_name, const char *orig_string,
+  struct Node *escontext)
+{
 	float		val;
 	char	   *endptr;
 
 	/*
 	 * endptr points to the first character _after_ the sequence we recognized
-	 * as a valid floating point number. orig_num points to the original input
+	 * as a valid floating point number. orig_string points to the original
+	 * input
 	 * string.
 	 */
-	orig_num = num;
 
 	/* skip leading whitespace */
 	while (*num != '\0' && isspace((unsigned char) *num))
@@ -184,10 +202,10 @@ float4in(PG_FUNCTION_ARGS)
 	 * strtod() on different platforms.
 	 */
 	if (*num == '\0')
-		ereturn(escontext, (Datum) 0,
+		ereturn(escontext, 0,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for type %s: \"%s\"",
-		"real", orig_num)));
+		type_name, orig_string)));
 
 	errno = 0;
 	val = strtof(num, );
@@ -258,30 +276,32 @@ float4in(PG_FUNCTION_ARGS)
 (val >= HUGE_VALF || val <= -HUGE_VALF)
 #endif
 )
-ereturn(escontext, (Datum) 0,
+ereturn(escontext, 0,
 		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-		 errmsg("\"%s\" is out of range for type real",
-orig_num)));
+		 errmsg("\"%s\" is out of range for type %s",
+orig_string, type_name)));
 		}
 		else
-			ereturn(escontext, (Datum) 0,
+			ereturn(escontext, 0,
 	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 	 errmsg("invalid input syntax for type %s: \"%s\"",
-			"real", orig_num)));
+			type_name, orig_string)));
 	}
 
 	/* skip trailing whitespace */
 	while (*endptr != '\0' && isspace((unsigned char) *endptr))
 		endptr++;
 
-	/* if there is any junk left at the end of the string, bail out */
-	if (*endptr != '\0')
-		ereturn(escontext, (Datum) 0,
+	/* report stopping point if wanted, else complain if not end of string */
+	if (endptr_p)
+		*endptr_p = endptr;
+	else if (*endptr != '\0')
+		ereturn(escontext, 0,
 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  errmsg("invalid input syntax for type %s: \"%s\"",
-		"real", orig_num)));
+		type_name, orig_string)));
 
-	PG_RETURN_FLOAT4(val);
+	return val;
 }
 
 /*
diff --git a/src/include/utils/float.h b/src/include/utils/float.h
index f92860b4a4..60e897be75 100644
--- a/src/include/utils/float.h
+++ b/src/include/utils/float.h
@@ -44,6 +44,9 @@ extern int	is_infinite(float8 val);
 extern float8 float8in_internal(char *num, char **endptr_p,
 const char *type_name, const char *orig_string,
 struct Node *escontext);
+extern float4 float4in_internal(char *num, char **endptr_p,
+const char *type_name, const char *orig_string,
+struct Node *escontext);
 extern char *float8out_internal(float8 num);
 extern int	float4_cmp_internal(float4 a, float4 b);
 extern int	float8_cmp_internal(float8 a, float8 b);
-- 
2.34.1



Re: Force streaming every change in logical decoding

2022-12-21 Thread Masahiko Sawada
On Wed, Dec 21, 2022 at 10:14 PM shiy.f...@fujitsu.com
 wrote:
>
> On Wed, Dec 21, 2022 4:54 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 21, 2022 at 1:55 PM Peter Smith 
> > wrote:
> > >
> > > On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada
> >  wrote:
> > > >
> > > > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila 
> > wrote:
> > > > >
> > > > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> > > > >  wrote:
> > > > > >
> > > > > > Dear hackers,
> > > > > >
> > > > > > > We have discussed three different ways to provide GUC for these
> > > > > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > > > > force_server_serialize_mode, force_client_serialize_mode (we can
> > use
> > > > > > > different names for these) for each of these; (2) Have two sets of
> > > > > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > > > > values as 'stream' and 'serialize' for the server and then
> > > > > > > logical_apply_serialize = true/false for the client. (3) Have one 
> > > > > > > GUC
> > > > > > > like logical_replication_mode with values as 'server_stream',
> > > > > > > 'server_serialize', 'client_serialize'.
> > > > > >
> > > > > > I also agreed for adding new GUC parameters (and I have already done
> > partially
> > > > > > in parallel apply[1]), and basically options 2 made sense for me. 
> > > > > > But is
> > it OK
> > > > > > that we can choose "serialize" mode even if subscribers require
> > streaming?
> > > > > >
> > > > > > Currently the reorder buffer transactions are serialized on 
> > > > > > publisher
> > only when
> > > > > > the there are no streamable transaction. So what happen if the
> > > > > > logical_decoding_mode = "serialize" but streaming option streaming 
> > > > > > is
> > on? If we
> > > > > > break the first one and serialize changes on publisher anyway, it 
> > > > > > may
> > be not
> > > > > > suitable for testing the normal operation.
> > > > > >
> > > > >
> > > > > I think the change will be streamed as soon as the next change is
> > > > > processed even if we serialize based on this option. See
> > > > > ReorderBufferProcessPartialChange. However, I see your point that
> > when
> > > > > the streaming option is given, the value 'serialize' for this GUC may
> > > > > not make much sense.
> > > > >
> > > > > > Therefore, I came up with the variant of (2): logical_decoding_mode
> > can be
> > > > > > "normal" or "immediate".
> > > > > >
> > > > > > "normal" is a default value, which is same as current HEAD. Changes
> > are streamed
> > > > > > or serialized when the buffered size exceeds
> > logical_decoding_work_mem.
> > > > > >
> > > > > > When users set to "immediate", the walsenders starts to stream or
> > serialize all
> > > > > > changes. The choice is depends on the subscription option.
> > > > > >
> > > > >
> > > > > The other possibility to achieve what you are saying is that we allow
> > > > > a minimum value of logical_decoding_work_mem as 0 which would
> > mean
> > > > > stream or serialize each change depending on whether the streaming
> > > > > option is enabled. I think we normally don't allow a minimum value
> > > > > below a certain threshold for other *_work_mem parameters (like
> > > > > maintenance_work_mem, work_mem), so we have followed the same
> > here.
> > > > > And, I think it makes sense from the user's perspective because below
> > > > > a certain threshold it will just add overhead by either writing small
> > > > > changes to the disk or by sending those over the network. However, it
> > > > > can be quite useful for testing/debugging. So, not sure, if we should
> > > > > restrict setting logical_decoding_work_mem below a certain threshold.
> > > > > What do you think?
> > > >
> > > > I agree with (2), having separate GUCs for publisher side and
> > > > subscriber side. Also, on the publisher side, Amit's idea, controlling
> > > > the logical decoding behavior by changing logical_decoding_work_mem,
> > > > seems like a good idea.
> > > >
> > > > But I'm not sure it's a good idea if we lower the minimum value of
> > > > logical_decoding_work_mem to 0. I agree it's helpful for testing and
> > > > debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> > > > users at all, rather brings risks.
> > > >
> > > > I prefer the idea Kuroda-san previously proposed; setting
> > > > logical_decoding_mode = 'immediate' means setting
> > > > logical_decoding_work_mem = 0. We might not need to have it as an
> > enum
> > > > parameter since it has only two values, though.
> > >
> > > Did you mean one GUC (logical_decoding_mode) will cause a side-effect
> > > implicit value change on another GUC value
> > > (logical_decoding_work_mem)?
> > >
> >
> > I don't think that is required. The only value that can be allowed for
> > logical_decoding_mode will be "immediate", something like we do for
> > recovery_target. The default will be "". The "immediate" value will
> > mean that stream each change if the 

RE: Force streaming every change in logical decoding

2022-12-21 Thread shiy.f...@fujitsu.com
On Wed, Dec 21, 2022 4:05 PM Peter Smith  wrote:
> 
> Here are some review comments for patch v2.
> 
> Since the GUC is still under design maybe these comments can be
> ignored for now, but I guess similar comments will apply in future
> anyhow (just with some name changes).
> 

Thanks for your comments.

> ==
> 
> 3. More docs.
> 
> It might be helpful if this developer option is referenced also on the
> page "31.10.1 Logical Replication > Configuration Settings >
> Publishers" [1]
> 

The new added GUC is developer option, and it seems that most developer options
are not referenced on other pages. So, I am not sure if we need to add this to
other docs.

Other comments are fixed [1]. (Some of them are ignored because of the new
design.)

[1] 
https://www.postgresql.org/message-id/OSZPR01MB6310AAE12BC281158880380DFDEB9%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Regards,
Shi yu


RE: Force streaming every change in logical decoding

2022-12-21 Thread shiy.f...@fujitsu.com
On Wed, Dec 21, 2022 4:54 PM Amit Kapila  wrote:
> 
> On Wed, Dec 21, 2022 at 1:55 PM Peter Smith 
> wrote:
> >
> > On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada
>  wrote:
> > >
> > > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila 
> wrote:
> > > >
> > > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> > > >  wrote:
> > > > >
> > > > > Dear hackers,
> > > > >
> > > > > > We have discussed three different ways to provide GUC for these
> > > > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > > > force_server_serialize_mode, force_client_serialize_mode (we can
> use
> > > > > > different names for these) for each of these; (2) Have two sets of
> > > > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > > > values as 'stream' and 'serialize' for the server and then
> > > > > > logical_apply_serialize = true/false for the client. (3) Have one 
> > > > > > GUC
> > > > > > like logical_replication_mode with values as 'server_stream',
> > > > > > 'server_serialize', 'client_serialize'.
> > > > >
> > > > > I also agreed for adding new GUC parameters (and I have already done
> partially
> > > > > in parallel apply[1]), and basically options 2 made sense for me. But 
> > > > > is
> it OK
> > > > > that we can choose "serialize" mode even if subscribers require
> streaming?
> > > > >
> > > > > Currently the reorder buffer transactions are serialized on publisher
> only when
> > > > > the there are no streamable transaction. So what happen if the
> > > > > logical_decoding_mode = "serialize" but streaming option streaming is
> on? If we
> > > > > break the first one and serialize changes on publisher anyway, it may
> be not
> > > > > suitable for testing the normal operation.
> > > > >
> > > >
> > > > I think the change will be streamed as soon as the next change is
> > > > processed even if we serialize based on this option. See
> > > > ReorderBufferProcessPartialChange. However, I see your point that
> when
> > > > the streaming option is given, the value 'serialize' for this GUC may
> > > > not make much sense.
> > > >
> > > > > Therefore, I came up with the variant of (2): logical_decoding_mode
> can be
> > > > > "normal" or "immediate".
> > > > >
> > > > > "normal" is a default value, which is same as current HEAD. Changes
> are streamed
> > > > > or serialized when the buffered size exceeds
> logical_decoding_work_mem.
> > > > >
> > > > > When users set to "immediate", the walsenders starts to stream or
> serialize all
> > > > > changes. The choice is depends on the subscription option.
> > > > >
> > > >
> > > > The other possibility to achieve what you are saying is that we allow
> > > > a minimum value of logical_decoding_work_mem as 0 which would
> mean
> > > > stream or serialize each change depending on whether the streaming
> > > > option is enabled. I think we normally don't allow a minimum value
> > > > below a certain threshold for other *_work_mem parameters (like
> > > > maintenance_work_mem, work_mem), so we have followed the same
> here.
> > > > And, I think it makes sense from the user's perspective because below
> > > > a certain threshold it will just add overhead by either writing small
> > > > changes to the disk or by sending those over the network. However, it
> > > > can be quite useful for testing/debugging. So, not sure, if we should
> > > > restrict setting logical_decoding_work_mem below a certain threshold.
> > > > What do you think?
> > >
> > > I agree with (2), having separate GUCs for publisher side and
> > > subscriber side. Also, on the publisher side, Amit's idea, controlling
> > > the logical decoding behavior by changing logical_decoding_work_mem,
> > > seems like a good idea.
> > >
> > > But I'm not sure it's a good idea if we lower the minimum value of
> > > logical_decoding_work_mem to 0. I agree it's helpful for testing and
> > > debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> > > users at all, rather brings risks.
> > >
> > > I prefer the idea Kuroda-san previously proposed; setting
> > > logical_decoding_mode = 'immediate' means setting
> > > logical_decoding_work_mem = 0. We might not need to have it as an
> enum
> > > parameter since it has only two values, though.
> >
> > Did you mean one GUC (logical_decoding_mode) will cause a side-effect
> > implicit value change on another GUC value
> > (logical_decoding_work_mem)?
> >
> 
> I don't think that is required. The only value that can be allowed for
> logical_decoding_mode will be "immediate", something like we do for
> recovery_target. The default will be "". The "immediate" value will
> mean that stream each change if the "streaming" option is enabled
> ("on" of "parallel") or if "streaming" is not enabled then that would
> mean serializing each change.
> 

I agreed and updated the patch as Amit suggested.
Please see the attached patch.

Regards,
Shi yu


v3-0001-Allow-streaming-or-serializing-each-change-in-log.patch
Description:  

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-12-21 Thread Tomas Vondra
On 12/15/22 17:46, Tom Lane wrote:
> James Finnerty  writes:
>> This patch looks good to me.  I have two very minor nits: The inflation
>> of the sample size by 10% is arbitrary but it doesn't seem unreasonable
>> or concerning.  It just makes me curious if there are any known cases
>> that motivated adding this logic.
> 
> I wondered why, too.
> 

Not sure about known cases, but the motivation is explained in the
comment before the 10% is applied.

The repluples value is never going to be spot on, and we use that to
determine what fraction of the table to sample (because all built-in
TABLESAMPLE methods only accept fraction to sample, not expected number
of rows).

If pg_class.reltuples is lower (than the actual row count), we'll end up
with sample fraction too high. That's mostly harmless, as we'll then
discard some of the rows locally.

But if the pg_class.reltuples is higher, we'll end up sampling too few
rows (less than targrows). This can happen e.g. after a bulk delete.

Yes, the 10% value is mostly arbitrary, and maybe it's pointless. How
much may the stats change with 10% larger sample? Probably not much, so
is it really solving anything?

Also, maybe it's fixing the issue at the wrong level - if stale
reltuples are an issue, maybe the right fix is making it more accurate
on the source instance. Decrease autovacuum_vacuum_scale_factor, or
maybe also look at pg_stat_all_tables or something.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-12-21 Thread Amit Kapila
On Tue, Dec 20, 2022 at 8:14 PM Melih Mutlu  wrote:
>
> Amit Kapila , 16 Ara 2022 Cum, 05:46 tarihinde şunu 
> yazdı:
>>
>> Right, but when the size is 100MB, it seems to be taking a bit more
>> time. Do we want to evaluate with different sizes to see how it looks?
>> Other than that all the numbers are good.
>
>
> I did a similar testing with again 100MB and also 1GB this time.
>
>  | 100 MB   | 1 GB
> --
> master  |  14761.425 ms   |  160932.982 ms
> --
>  patch   |  14398.408 ms   |  160593.078 ms
>
> This time, it seems like the patch seems slightly faster than the master.
> Not sure if we can say the patch slows things down (or speeds up) if the size 
> of tables increases.
> The difference may be something arbitrary or caused by other factors. What do 
> you think?
>

Yes, I agree with you as I also can't see an obvious reason for any
slowdown with this patch's idea.

-- 
With Regards,
Amit Kapila.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-21 Thread Amit Kapila
On Wed, Dec 21, 2022 at 11:02 AM houzj.f...@fujitsu.com
 wrote:
>
>
> Attach the new patch set which also includes some
> cosmetic comment changes.
>

I noticed one problem with the recent change in the patch.

+ * The fileset state should become FS_SERIALIZE_DONE once we have waited
+ * for a lock in the FS_SERIALIZE_IN_PROGRESS state, so we get the state
+ * again and recheck it later.
+ */
+ if (fileset_state == FS_SERIALIZE_IN_PROGRESS)
+ {
+ pa_lock_stream(MyParallelShared->xid, AccessShareLock);
+ pa_unlock_stream(MyParallelShared->xid, AccessShareLock);
+
+ fileset_state = pa_get_fileset_state();
+ Assert(fileset_state == FS_SERIALIZE_DONE);

This is not always true because say due to deadlock, this lock is
released by the leader worker, in that case, the file state will be
still in progress. So, I think we need a change like the below:
diff --git a/src/backend/replication/logical/applyparallelworker.c
b/src/backend/replication/logical/applyparallelworker.c
index 45faa74596..8076786f0d 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -686,8 +686,8 @@ pa_spooled_messages(void)
 * the leader had serialized all changes which can lead to undetected
 * deadlock.
 *
-* The fileset state must be FS_SERIALIZE_DONE once the leader
worker has
-* finished serializing the changes.
+* Note that the fileset state can be FS_SERIALIZE_DONE once the leader
+* worker has finished serializing the changes.
 */
if (fileset_state == FS_SERIALIZE_IN_PROGRESS)
{
@@ -695,7 +695,6 @@ pa_spooled_messages(void)
pa_unlock_stream(MyParallelShared->xid, AccessShareLock);

fileset_state = pa_get_fileset_state();
-   Assert(fileset_state == FS_SERIALIZE_DONE);

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL

2022-12-21 Thread Bharath Rupireddy
On Fri, Dec 16, 2022 at 4:47 AM David Christensen
 wrote:
>
> On Thu, Dec 15, 2022 at 12:36 AM Michael Paquier  wrote:
> >
> > On Wed, Dec 14, 2022 at 04:44:34PM -0600, David Christensen wrote:
> > > I can get one sent in tomorrow.
>
> This v10 should incorporate your feedback as well as Bharath's.

Thanks for the patch. Here're some minor comments:

1. +my $node =  PostgreSQL::Test::Cluster->new('primary');
Can the name be other than 'primary' because we don't create a standby
for this test? Something like - 'node_a' or 'node_extract_fpi' or some
other.

2. +$node->init(extra => ['-k'], allows_streaming => 1);
When enabled with allows_streaming, there are a bunch of things that
happen to the node while initializing, I don't think we need all of
them for this.

3. +$node->init(extra => ['-k'], allows_streaming => 1);
Can we use --data-checksums instead of -k for more readability?
Perhaps, a comment on why we need that option helps greatly.

4.
+page = (Page) buf.data;
+
+if (!XLogRecHasBlockRef(record, block_id))
+continue;
+
+if (!XLogRecHasBlockImage(record, block_id))
+continue;
+
+if (!RestoreBlockImage(record, block_id, page))
+continue;
Can you shift  page = (Page) buf.data; just before the last if
condition RestoreBlockImage() so that it doesn't get executed for the
other two continue statements?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Inconsistency in reporting checkpointer stats

2022-12-21 Thread Nitin Jadhav
Thanks Robert and Andres for sharing your thoughts.

I have modified the code accordingly and attached the new version of
patches. patch 0001 fixes the inconsistency in checkpointer stats and
patch 0002 separates main buffer and SLRU buffer count from checkpoint
complete log message. In 0001, I added a new column to
pg_stat_bgwriter view and named it as slru_buffers_checkpoint and kept
the existing column buffers_checkpoint as-is. Should I rename this to
something like main_buffers_checkpoint? Thoughts?

Please refer to sample checkpoint complete log message[1]. I am not
quite satisfied with the percentage of buffers written information
logged there. The percentage is calculated based on NBuffers in both
the cases but I am just worried that are we passing wrong information
to the user while user may
think that the percentage of buffers is based on the total number of
buffers available and the percentage of SLRU buffers is based on the
total number of SLRU buffers available.

Kindly review and share your comments.

[1]:
2022-12-21 10:52:25.931 UTC [63530] LOG:  checkpoint complete: wrote
4670 buffers (28.5%), wrote 3 slru buffers (0.0%); 0 WAL file(s)
added, 0 removed, 4 recycled; write=0.045 s, sync=0.161 s, total=0.244
s; sync files=25, longest=0.146 s, average=0.007 s; distance=66130 kB,
estimate=66130 kB; lsn=0/5557C78, redo lsn=0/5557C40

Thanks & Regards,
Nitin Jadhav

On Tue, Dec 20, 2022 at 11:08 PM Andres Freund  wrote:
>
> On 2022-12-20 08:18:36 -0500, Robert Haas wrote:
> > I think that the SLRU information is potentially useful, but mixing it
> > with the information about regular buffers just seems confusing.
>
> +1
>
> At least for now, it'd be different if/when we manage to move SLRUs to
> the main buffer pool.
>
> - Andres


v3-0002-Separate-main-buffer-and-SLRU-buffer-count-from-chec.patch
Description: Binary data


v3-0001-Fix-inconsistency-in-checkpointer-stats.patch
Description: Binary data


Re: Inconsistency in reporting checkpointer stats

2022-12-21 Thread Bharath Rupireddy
On Tue, Dec 20, 2022 at 11:08 PM Andres Freund  wrote:
>
> On 2022-12-20 08:18:36 -0500, Robert Haas wrote:
> > I think that the SLRU information is potentially useful, but mixing it
> > with the information about regular buffers just seems confusing.
>
> +1
>
> At least for now, it'd be different if/when we manage to move SLRUs to
> the main buffer pool.

+1 to not count SLRU writes in ckpt_bufs_written. If needed we can
have new fields CheckpointStats.ckpt_slru_bufs_written and
PendingCheckpointerStats.slru_buf_written_checkpoint.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: generic plans and "initial" pruning

2022-12-21 Thread Amit Langote
On Wed, Dec 21, 2022 at 7:18 PM Alvaro Herrera  wrote:
> This version of the patch looks not entirely unreasonable to me.  I'll
> set this as Ready for Committer in case David or Tom or someone else
> want to have a look and potentially commit it.

Thank you, Alvaro.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: daitch_mokotoff module

2022-12-21 Thread Dag Lem
Hi Andreas,

Thank you for your detailed and constructive review!

I have made a conscientuous effort to address all the issues you point
out, please see comments below.

Andres Freund  writes:

> Hi,
>
> On 2022-02-03 15:27:32 +0100, Dag Lem wrote:

[...]

> [23:43:34.796] contrib/fuzzystrmatch/meson.build:18:0: ERROR: File
> fuzzystrmatch--1.1.sql does not exist.
>
>
>> -DATA = fuzzystrmatch--1.1.sql fuzzystrmatch--1.0--1.1.sql
>> +DATA = fuzzystrmatch--1.2.sql fuzzystrmatch--1.1--1.2.sql
>> fuzzystrmatch--1.0--1.1.sql
>>  PGFILEDESC = "fuzzystrmatch - similarities and distance between strings"
>
>
> The patch seems to remove fuzzystrmatch--1.1.sql - I suggest not doing
> that. In recent years our approach has been to just keep the "base version" of
> the upgrade script, with extension creation running through the upgrade
> scripts.
>

OK, I have now kept fuzzystrmatch--1.1.sql, and omitted
fuzzystrmatch--1.2.sql

Both the Makefile and meson.build are updated to handle the new files,
including the generated header.

>>  
>> +
>> +#include "daitch_mokotoff.h"
>> +
>> +#include "postgres.h"
>
> Postgres policy is that the include of "postgres.h" has to be the first
> include in every .c file.
>
>

OK, fixed.

>> +#include "utils/builtins.h"
>> +#include "mb/pg_wchar.h"
>> +
>> +#include 
>> +
>> +/* Internal C implementation */
>> +static char *_daitch_mokotoff(char *word, char *soundex, size_t n);
>> +
>> +
>> +PG_FUNCTION_INFO_V1(daitch_mokotoff);
>> +Datum
>> +daitch_mokotoff(PG_FUNCTION_ARGS)
>> +{
>> +text   *arg = PG_GETARG_TEXT_PP(0);
>> +char   *string,
>> +   *tmp_soundex;
>> +text   *soundex;
>> +
>> +/*
>> + * The maximum theoretical soundex size is several KB, however in
>> practice
>> + * anything but contrived synthetic inputs will yield a soundex size of
>> + * less than 100 bytes. We thus allocate and free a temporary work
>> buffer,
>> + * and return only the actual soundex result.
>> + */
>> + string = pg_server_to_any(text_to_cstring(arg),
>> VARSIZE_ANY_EXHDR(arg), PG_UTF8);
>> +tmp_soundex = palloc(DM_MAX_SOUNDEX_CHARS);
>
> Seems that just using StringInfo to hold the soundex output would work better
> than a static allocation?
>

OK, fixed.

>
>> +if (!_daitch_mokotoff(string, tmp_soundex, DM_MAX_SOUNDEX_CHARS))
>
> We imo shouldn't introduce new functions starting with _.
>

OK, fixed. Note that I just followed the existing pattern in
fuzzystrmatch.c there.

[...]

>> +/* Find next node corresponding to code digit, or create a new node. */
>> +static dm_node * find_or_create_node(dm_nodes nodes, int *num_nodes,
>> + dm_node * node, char code_digit)
>
> PG code style is to have a line break between a function defintion's return
> type and the function name - like you actually do above.
>

OK, fixed. Both pgindent and I must have missed that particular
function.

>> +/* Mapping from ISO8859-1 to upper-case ASCII */
>> +static const char tr_iso8859_1_to_ascii_upper[] =
>> +/*
>> +"`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬
>> ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"
>> +*/
>> +"`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~ !
>> ?AAECDNO*OYDSAAECDNO/OYDY";
>> +
>> +static char
>> +iso8859_1_to_ascii_upper(unsigned char c)
>> +{
>> +return c >= 0x60 ? tr_iso8859_1_to_ascii_upper[c - 0x60] : c;
>> +}
>> +
>> +
>> +/* Convert an UTF-8 character to ISO-8859-1.
>> + * Unconvertable characters are returned as '?'.
>> + * NB! Beware of the domain specific conversion of Ą, Ę, and Ţ/Ț.
>> + */
>> +static char
>> +utf8_to_iso8859_1(char *str, int *ix)
>
> It seems decidedly not great to have custom encoding conversion routines in a
> contrib module. Is there any way we can avoid this?
>

I have now replaced the custom UTF-8 decode with calls to
utf8_to_unicode and pg_utf_mblen, and simplified the subsequent
conversion to ASCII. Hopefully this makes the conversion code more
palatable.

I don't see how the conversion to ASCII could be substantially
simplified further. The conversion maps lowercase and 8 bit ISO8859-1
characters to ASCII via uppercasing, removal of accents, and discarding
of special characters. In addition to that, it maps (the non-ISO8859-1)
Ą, Ę, and Ţ/Ț from the coding chart to [, \, and ]. After this, a simple
O(1) table lookup can be used to retrieve the soundex code tree for a
letter sequence.

>
>> +/* Generate all Daitch-Mokotoff soundex codes for word, separated
>> by space. */
>> +static char *
>> +_daitch_mokotoff(char *word, char *soundex, size_t n)
>> +{
>> +int i = 0,
>> +j;
>> +int letter_no = 0;
>> +int ix_leaves = 0;
>> +int num_nodes = 0,
>> +num_leaves = 0;
>> +dm_codes   *codes,
>> +   *next_codes;
>> +dm_node*nodes;
>> +dm_leaves  

Re: generic plans and "initial" pruning

2022-12-21 Thread Alvaro Herrera
This version of the patch looks not entirely unreasonable to me.  I'll
set this as Ready for Committer in case David or Tom or someone else
want to have a look and potentially commit it.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




RE: Force streaming every change in logical decoding

2022-12-21 Thread Hayato Kuroda (Fujitsu)
Dear Shveta, other hackers

> Going with ' logical_decoding_work_mem' seems a reasonable solution, but 
> since we are mixing
> the functionality of developer and production GUC, there is a slight risk 
> that customer/DBAs may end
> up setting it to 0 and forget about it and thus hampering system's 
> performance.
> Have seen many such cases in previous org.

That never crossed my mind at all. Indeed, DBA may be confused, this proposal 
seems to be too optimized.
I can withdraw this and we can go another way, maybe my previous approach.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Minimal logical decoding on standbys

2022-12-21 Thread Drouvot, Bertrand

Hi,

On 12/20/22 10:41 PM, Robert Haas wrote:

On Tue, Dec 20, 2022 at 3:39 PM Robert Haas  wrote:

I think this might be the only WAL record type where there's a
problem, but I haven't fully confirmed that yet.


It's not. GIST has the same issue. The same test case demonstrates the
problem there, if you substitute this test script for
kpt_hash_setup.sql and possibly also run it for somewhat longer. One
might think that this wouldn't be a problem, because the comments for
gistxlogDelete say this:

  /*
   * In payload of blk 0 : todelete OffsetNumbers
   */

But it's not in the payload of blk 0. It follows the main payload.


Oh right, nice catch!

Indeed, we can see in gistRedoDeleteRecord():

"
todelete = (OffsetNumber *) ((char *) xldata + SizeOfGistxlogDelete);
"



This is the reverse of xl_heap_freeze_page, which claims that freeze
plans and offset numbers follow, but they don't: they're in the data
for block 0. 


oh right, we can see in heap_xlog_freeze_page():

"
plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, 
NULL);
offsets = (OffsetNumber *) ((char *) plans +

(xlrec->nplans *
 
sizeof(xl_heap_freeze_plan)));
"



xl_btree_delete is also wrong, claiming that the data
follows when it's really attached to block 0. 



oh right, we can see in btree_xlog_delete():

"
char   *ptr = XLogRecGetBlockData(record, 0, NULL);

page = (Page) BufferGetPage(buffer);

if (xlrec->nupdated > 0)
{
OffsetNumber *updatedoffsets;
xl_btree_update *updates;

updatedoffsets = (OffsetNumber *)
(ptr + xlrec->ndeleted * sizeof(OffsetNumber));
updates = (xl_btree_update *) ((char *) updatedoffsets +
   
xlrec->nupdated *

   sizeof(OffsetNumber));
"




I guess whatever else we
do here, we should fix the comments.



Agree, please find attached a patch proposal doing so.



Bottom line is that I think the two cases that have alignment issues
as coded are xl_hash_vacuum_one_page and gistxlogDelete. Everything
else is OK, as far as I can tell right now.



Thanks a lot for the repro(s) and explanations! That's very useful/helpful.

Based on your discovery about the wrong comments above, I'm now tempted to fix 
those 2 alignment issues
by using a FLEXIBLE_ARRAY_MEMBER within those structs (as you proposed in [1]) 
(as that should also prevent
any possible wrong comments about where the array is located).

What do you think?

[1]: 
https://www.postgresql.org/message-id/CA%2BTgmoaVcu_mbxbH%3DEccvKG6u8%2BMdQf9zx98uAL9zsStFwrYsQ%40mail.gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index 33f1c7e31b..95068feb87 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -52,9 +52,7 @@ typedef struct gistxlogDelete
TransactionId snapshotConflictHorizon;
uint16  ntodelete;  /* number of deleted offsets */
 
-   /*
-* In payload of blk 0 : todelete OffsetNumbers
-*/
+   /* TODELETE OFFSET NUMBER ARRAY FOLLOWS */
 } gistxlogDelete;
 
 #define SizeOfGistxlogDelete   (offsetof(gistxlogDelete, ntodelete) + 
sizeof(uint16))
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 5c77290eec..1117e95ede 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -345,8 +345,9 @@ typedef struct xl_heap_freeze_page
TransactionId snapshotConflictHorizon;
uint16  nplans;
 
-   /* FREEZE PLANS FOLLOW */
-   /* OFFSET NUMBER ARRAY FOLLOWS */
+   /*
+* In payload of blk 0 : FREEZE PLANS and OFFSET NUMBER ARRAY
+*/
 } xl_heap_freeze_page;
 
 #define SizeOfHeapFreezePage (offsetof(xl_heap_freeze_page, nplans) + 
sizeof(uint16))
diff --git a/src/include/access/nbtxlog.h b/src/include/access/nbtxlog.h
index 3b2d959c69..1aa8e7eca5 100644
--- a/src/include/access/nbtxlog.h
+++ b/src/include/access/nbtxlog.h
@@ -236,9 +236,12 @@ typedef struct xl_btree_delete
uint16  ndeleted;
uint16  nupdated;
 
-   /* DELETED TARGET OFFSET NUMBERS FOLLOW */
-   /* UPDATED TARGET OFFSET NUMBERS FOLLOW */
-   /* UPDATED TUPLES METADATA (xl_btree_update) ARRAY FOLLOWS */
+   /*
+* In payload of blk 0 :
+* - DELETED TARGET OFFSET NUMBERS
+* - UPDATED TARGET OFFSET NUMBERS
+* - UPDATED 

Re: pgsql: Doc: Explain about Column List feature.

2022-12-21 Thread Peter Smith
On Wed, Dec 21, 2022 at 6:59 PM Alvaro Herrera  wrote:
>
> On 2022-Dec-21, Peter Smith wrote:
>
> > By "searching" I also meant just scanning visually, although I was
> > thinking more about scanning the PDF.
> >
> > Right now, the intention of any text box is obvious at a glance
> > because of those titles like "Caution", "Tip", "Note", "Warning".
> > Sure, the HTML rendering also uses colours to convey the purpose, but
> > in the PDF version [1] everything is black-and-white so apart from the
> > title all boxes look the same. That's why I felt using non-standard
> > box titles might be throwing away some of the meaning - e.g. the
> > reader of the PDF won't know anymore at a glance are they looking at a
> > warning or a tip.
>
> Oh, I see.  It's been so long that I haven't looked at the PDFs, that I
> failed to realize that they don't use color.  I agree that would be a
> problem.  Maybe we can change the title to have the word:
>
> Warning: Combining Column Lists from Multiple Publications
>

That last idea LGTM. But no patch at all LGTM also.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Force streaming every change in logical decoding

2022-12-21 Thread Amit Kapila
On Wed, Dec 21, 2022 at 1:55 PM Peter Smith  wrote:
>
> On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada  wrote:
> >
> > On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila  wrote:
> > >
> > > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> > >  wrote:
> > > >
> > > > Dear hackers,
> > > >
> > > > > We have discussed three different ways to provide GUC for these
> > > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > > force_server_serialize_mode, force_client_serialize_mode (we can use
> > > > > different names for these) for each of these; (2) Have two sets of
> > > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > > values as 'stream' and 'serialize' for the server and then
> > > > > logical_apply_serialize = true/false for the client. (3) Have one GUC
> > > > > like logical_replication_mode with values as 'server_stream',
> > > > > 'server_serialize', 'client_serialize'.
> > > >
> > > > I also agreed for adding new GUC parameters (and I have already done 
> > > > partially
> > > > in parallel apply[1]), and basically options 2 made sense for me. But 
> > > > is it OK
> > > > that we can choose "serialize" mode even if subscribers require 
> > > > streaming?
> > > >
> > > > Currently the reorder buffer transactions are serialized on publisher 
> > > > only when
> > > > the there are no streamable transaction. So what happen if the
> > > > logical_decoding_mode = "serialize" but streaming option streaming is 
> > > > on? If we
> > > > break the first one and serialize changes on publisher anyway, it may 
> > > > be not
> > > > suitable for testing the normal operation.
> > > >
> > >
> > > I think the change will be streamed as soon as the next change is
> > > processed even if we serialize based on this option. See
> > > ReorderBufferProcessPartialChange. However, I see your point that when
> > > the streaming option is given, the value 'serialize' for this GUC may
> > > not make much sense.
> > >
> > > > Therefore, I came up with the variant of (2): logical_decoding_mode can 
> > > > be
> > > > "normal" or "immediate".
> > > >
> > > > "normal" is a default value, which is same as current HEAD. Changes are 
> > > > streamed
> > > > or serialized when the buffered size exceeds logical_decoding_work_mem.
> > > >
> > > > When users set to "immediate", the walsenders starts to stream or 
> > > > serialize all
> > > > changes. The choice is depends on the subscription option.
> > > >
> > >
> > > The other possibility to achieve what you are saying is that we allow
> > > a minimum value of logical_decoding_work_mem as 0 which would mean
> > > stream or serialize each change depending on whether the streaming
> > > option is enabled. I think we normally don't allow a minimum value
> > > below a certain threshold for other *_work_mem parameters (like
> > > maintenance_work_mem, work_mem), so we have followed the same here.
> > > And, I think it makes sense from the user's perspective because below
> > > a certain threshold it will just add overhead by either writing small
> > > changes to the disk or by sending those over the network. However, it
> > > can be quite useful for testing/debugging. So, not sure, if we should
> > > restrict setting logical_decoding_work_mem below a certain threshold.
> > > What do you think?
> >
> > I agree with (2), having separate GUCs for publisher side and
> > subscriber side. Also, on the publisher side, Amit's idea, controlling
> > the logical decoding behavior by changing logical_decoding_work_mem,
> > seems like a good idea.
> >
> > But I'm not sure it's a good idea if we lower the minimum value of
> > logical_decoding_work_mem to 0. I agree it's helpful for testing and
> > debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> > users at all, rather brings risks.
> >
> > I prefer the idea Kuroda-san previously proposed; setting
> > logical_decoding_mode = 'immediate' means setting
> > logical_decoding_work_mem = 0. We might not need to have it as an enum
> > parameter since it has only two values, though.
>
> Did you mean one GUC (logical_decoding_mode) will cause a side-effect
> implicit value change on another GUC value
> (logical_decoding_work_mem)?
>

I don't think that is required. The only value that can be allowed for
logical_decoding_mode will be "immediate", something like we do for
recovery_target. The default will be "". The "immediate" value will
mean that stream each change if the "streaming" option is enabled
("on" of "parallel") or if "streaming" is not enabled then that would
mean serializing each change.

-- 
With Regards,
Amit Kapila.




Re: Force streaming every change in logical decoding

2022-12-21 Thread Peter Smith
On Wed, Dec 21, 2022 at 6:22 PM Masahiko Sawada  wrote:
>
> On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila  wrote:
> >
> > On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear hackers,
> > >
> > > > We have discussed three different ways to provide GUC for these
> > > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > > force_server_serialize_mode, force_client_serialize_mode (we can use
> > > > different names for these) for each of these; (2) Have two sets of
> > > > GUCs for server and client. We can have logical_decoding_mode with
> > > > values as 'stream' and 'serialize' for the server and then
> > > > logical_apply_serialize = true/false for the client. (3) Have one GUC
> > > > like logical_replication_mode with values as 'server_stream',
> > > > 'server_serialize', 'client_serialize'.
> > >
> > > I also agreed for adding new GUC parameters (and I have already done 
> > > partially
> > > in parallel apply[1]), and basically options 2 made sense for me. But is 
> > > it OK
> > > that we can choose "serialize" mode even if subscribers require streaming?
> > >
> > > Currently the reorder buffer transactions are serialized on publisher 
> > > only when
> > > the there are no streamable transaction. So what happen if the
> > > logical_decoding_mode = "serialize" but streaming option streaming is on? 
> > > If we
> > > break the first one and serialize changes on publisher anyway, it may be 
> > > not
> > > suitable for testing the normal operation.
> > >
> >
> > I think the change will be streamed as soon as the next change is
> > processed even if we serialize based on this option. See
> > ReorderBufferProcessPartialChange. However, I see your point that when
> > the streaming option is given, the value 'serialize' for this GUC may
> > not make much sense.
> >
> > > Therefore, I came up with the variant of (2): logical_decoding_mode can be
> > > "normal" or "immediate".
> > >
> > > "normal" is a default value, which is same as current HEAD. Changes are 
> > > streamed
> > > or serialized when the buffered size exceeds logical_decoding_work_mem.
> > >
> > > When users set to "immediate", the walsenders starts to stream or 
> > > serialize all
> > > changes. The choice is depends on the subscription option.
> > >
> >
> > The other possibility to achieve what you are saying is that we allow
> > a minimum value of logical_decoding_work_mem as 0 which would mean
> > stream or serialize each change depending on whether the streaming
> > option is enabled. I think we normally don't allow a minimum value
> > below a certain threshold for other *_work_mem parameters (like
> > maintenance_work_mem, work_mem), so we have followed the same here.
> > And, I think it makes sense from the user's perspective because below
> > a certain threshold it will just add overhead by either writing small
> > changes to the disk or by sending those over the network. However, it
> > can be quite useful for testing/debugging. So, not sure, if we should
> > restrict setting logical_decoding_work_mem below a certain threshold.
> > What do you think?
>
> I agree with (2), having separate GUCs for publisher side and
> subscriber side. Also, on the publisher side, Amit's idea, controlling
> the logical decoding behavior by changing logical_decoding_work_mem,
> seems like a good idea.
>
> But I'm not sure it's a good idea if we lower the minimum value of
> logical_decoding_work_mem to 0. I agree it's helpful for testing and
> debugging but setting logical_decoding_work_mem = 0 doesn't benefit
> users at all, rather brings risks.
>
> I prefer the idea Kuroda-san previously proposed; setting
> logical_decoding_mode = 'immediate' means setting
> logical_decoding_work_mem = 0. We might not need to have it as an enum
> parameter since it has only two values, though.

Did you mean one GUC (logical_decoding_mode) will cause a side-effect
implicit value change on another GUC value
(logical_decoding_work_mem)?

If so, then that seems a like potential source of confusion IMO.
- e.g. actual value is invisibly set differently from what the user
sees in the conf file
- e.g. will it depend on the order they get assigned

Are there any GUC precedents for something like that?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] random_normal function

2022-12-21 Thread Vik Fearing

On 12/19/22 04:36, Michael Paquier wrote:

On Sat, Dec 17, 2022 at 05:49:15PM +0100, Fabien COELHO wrote:

Overall, I think that there should be a clearer discussion and plan about
which random functionS postgres should provide to complement the standard
instead of going there… randomly:-)


So, what does the specification tells about seeds, normal and random
functions?



Nothing at all.
--
Vik Fearing





Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-12-21 Thread Masahiko Sawada
On Tue, Dec 20, 2022 at 3:09 PM John Naylor
 wrote:
>
>
> On Mon, Dec 19, 2022 at 2:14 PM Masahiko Sawada  wrote:
> >
> > On Tue, Dec 13, 2022 at 1:04 AM Masahiko Sawada  
> > wrote:
>
> > > Looking at other code using DSA such as tidbitmap.c and nodeHash.c, it
> > > seems that they look at only memory that are actually dsa_allocate'd.
> > > To be exact, we estimate the number of hash buckets based on work_mem
> > > (and hash_mem_multiplier) and use it as the upper limit. So I've
> > > confirmed that the result of dsa_get_total_size() could exceed the
> > > limit. I'm not sure it's a known and legitimate usage. If we can
> > > follow such usage, we can probably track how much dsa_allocate'd
> > > memory is used in the radix tree.
> >
> > I've experimented with this idea. The newly added 0008 patch changes
> > the radix tree so that it counts the memory usage for both local and
> > shared cases. As shown below, there is an overhead for that:
> >
> > w/o 0008 patch
> >  298453544 | 282
>
> > w/0 0008 patch
> >  293603184 | 297
>
> This adds about as much overhead as the improvement I measured in the v4 slab 
> allocator patch.

Oh, yes, that's bad.

> https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
>
> I'm guessing the hash join case can afford to be precise about memory because 
> it must spill to disk when exceeding workmem. We don't have that design 
> constraint.

You mean that the memory used by the radix tree should be limited not
by the amount of memory actually used, but by the amount of memory
allocated? In other words, it checks by MomoryContextMemAllocated() in
the local cases and by dsa_get_total_size() in the shared case.

The idea of using up to half of maintenance_work_mem might be a good
idea compared to the current flat-array solution. But since it only
uses half, I'm concerned that there will be users who double their
maintenace_work_mem. When it is improved, the user needs to restore
maintenance_work_mem again.

A better solution would be to have slab-like DSA. We allocate the
dynamic shared memory by adding fixed-length large segments. However,
downside would be since the segment size gets large we need to
increase maintenance_work_mem as well. Also, this patch set is already
getting bigger and more complicated, I don't think it's a good idea to
add more.

If we limit the memory usage by checking the amount of memory actually
used, we can use SlabStats() for the local cases. Since DSA doesn't
have such functionality for now we would need to add it. Or we can
track it in the radix tree only in the shared cases.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Force streaming every change in logical decoding

2022-12-21 Thread Peter Smith
Here are some review comments for patch v2.

Since the GUC is still under design maybe these comments can be
ignored for now, but I guess similar comments will apply in future
anyhow (just with some name changes).

==

1. Commit message

Add a new GUC force_stream_mode, when it is set on, send the change to
output plugin immediately in streaming mode. Otherwise, send until
logical_decoding_work_mem is exceeded.

~

Is that quite right? I thought it was more like shown below:

SUGGESTION
Add a new GUC 'force_stream_mode' which modifies behavior when
streaming mode is enabled. If force_stream_mode is on the changes are
sent to the output plugin immediately. Otherwise,(when
force_stream_mode is off) changes are written to memory until
logical_decoding_work_mem is exceeded.

==

2. doc/src/sgml/config.sgml

+   
+Specifies whether to force sending the changes to output plugin
+immediately in streaming mode. If set to off (the
+default), send until logical_decoding_work_mem is
+exceeded.
+   

Suggest slight rewording like #1.

SUGGESTION
This modifies the behavior when streaming mode is enabled. If set to
on the changes are sent to the output plugin
immediately. If set off (the default), changes are
written to memory until logical_decoding_work_mem
is exceeded.

==

3. More docs.

It might be helpful if this developer option is referenced also on the
page "31.10.1 Logical Replication > Configuration Settings >
Publishers" [1]

==

src/backend/replication/logical/reorderbuffer.c

4. GUCs

+/*
+ * Whether to send the change to output plugin immediately in streaming mode.
+ * When it is off, wait until logical_decoding_work_mem is exceeded.
+ */
+bool force_stream_mode;

4a.
"to output plugin" -> "to the output plugin"

~

4b.
By convention (see. [2]) IIUC there should be some indication that
these (this and 'logical_decoding_work_mem') are GUC variables. e.g.
these should be refactored to be grouped togther without the other
static var in between. And add a comment for them both like:

/* GUC variable. */

~

4c.
Also, (see [2]) it makes the code more readable to give the GUC an
explicit initial value.

SUGGESTION
bool force_stream_mode = false;

~~~

5. ReorderBufferCheckMemoryLimit

+ /* we know there has to be one, because the size is not zero */

Uppercase comment. Although not strictly added by this patch you might
as well make this change while adjusting the indentation.

==

src/backend/utils/misc/guc_tables.c

6.

+ {
+ {"force_stream_mode", PGC_USERSET, DEVELOPER_OPTIONS,
+ gettext_noop("Force sending the changes to output plugin immediately
if streaming is supported, without waiting till
logical_decoding_work_mem."),
+ NULL,
+ GUC_NOT_IN_SAMPLE
+ },
+ _stream_mode,
+ false,
+ NULL, NULL, NULL
+ },

"without waiting till logical_decoding_work_mem." seem like an
incomplete sentence

SUGGESTION
Force sending any streaming changes to the output plugin immediately
without waiting until logical_decoding_work_mem is exceeded."),

--
[1] https://www.postgresql.org/docs/devel/logical-replication-config.html
[2] GUC declarations -
https://github.com/postgres/postgres/commit/d9d873bac67047cfacc9f5ef96ee488f2cb0f1c3

Kind Regards,
Peter Smith.
Fujitsu Australia