Re: recovery modules

2023-01-29 Thread Michael Paquier
On Fri, Jan 27, 2023 at 10:27:29PM -0800, Nathan Bossart wrote:
> Here is a work-in-progress patch set for adjusting the archive modules
> interface.  Is this roughly what you had in mind?

I have been catching up with what is happening here.  I can get
behind the idea to use the term "callbacks" vs "context" for clarity,
to move the callback definitions into their own header, and to add
extra arguments to the callback functions for some private data.

-void
-_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+const ArchiveModuleCallbacks *
+_PG_archive_module_init(void **arg)
 {
AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
 
-   cb->check_configured_cb = basic_archive_configured;
-   cb->archive_file_cb = basic_archive_file;
+   (*arg) = (void *) AllocSetContextCreate(TopMemoryContext,
+   "basic_archive",
+   ALLOCSET_DEFAULT_SIZES);
+
+   return _archive_callbacks;

Now, I find this part, where we use a double pointer to allow the
module initialization to create and give back a private area, rather
confusing, and I think that this could be bug-prone, as well.  Once
you incorporate some data within the set of callbacks, isn't this
stuff close to a "state" data, or just something that we could call
only an "ArchiveModule"?  Could it make more sense to have
_PG_archive_module_init return a structure with everything rather than
a separate in/out argument?  Here is the idea, simply:
typedef struct ArchiveModule {
ArchiveCallbacks *routines;
void *private_data;
/* Potentially more here, like some flags? */
} ArchiveModule;

That would be closer to the style of xlogreader.h, for example.

All these choices should be documented in archive_module.h, at the
end.
--
Michael


signature.asc
Description: PGP signature


meson: Optionally disable installation of test modules

2023-01-29 Thread Peter Eisentraut
One open issue (IMO) with the meson build system is that it installs the 
test modules under src/test/modules/ as part of a normal installation. 
This is because there is no way to set up up the build system to install 
extra things only when told.  I think we still need a way to disable 
this somehow, so that building a production installation doesn't end up 
with a bunch of extra files.


The attached simple patch is a starting point for discussion.  It just 
disables the subdirectory src/test/modules/ based on some Boolean 
setting.  This could be some new top-level option, or maybe part of 
PG_TEST_EXTRA, or something else?  With this, I get an identical set of 
installed files from meson.  I imagine this option would be false by 
default and developers would enable it.


Thoughts?
From 2573753c7af13f24cd8de404cbc6e54fdbb3040a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 30 Jan 2023 08:36:37 +0100
Subject: [PATCH v1] meson: Option to not install test modules

---
 meson.build  | 4 
 src/test/meson.build | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index e379a252a5..603a79668c 100644
--- a/meson.build
+++ b/meson.build
@@ -2809,6 +2809,10 @@ backend_code = declare_dependency(
 # libraries.
 
 
+# WIP
+test_install = false
+
+
 # Then through the main sources. That way contrib can have dependencies on
 # main sources. Note that this explicitly doesn't enter src/test, right now a
 # few regression tests depend on contrib files.
diff --git a/src/test/meson.build b/src/test/meson.build
index 5f3c9c2ba2..adef37b22a 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -6,7 +6,9 @@ subdir('isolation')
 subdir('authentication')
 subdir('recovery')
 subdir('subscription')
-subdir('modules')
+if test_install
+  subdir('modules')
+endif
 
 if ssl.found()
   subdir('ssl')
-- 
2.39.1



RE: Deadlock between logrep apply worker and tablesync worker

2023-01-29 Thread houzj.f...@fujitsu.com
On Monday, January 30, 2023 2:32 PM Amit Kapila  wrote:
> 
> On Mon, Jan 30, 2023 at 9:20 AM vignesh C  wrote:
> >
> > On Sat, 28 Jan 2023 at 11:26, Amit Kapila  wrote:
> > >
> > > One thing that looks a bit odd is that we will anyway have a similar
> > > check in replorigin_drop_guts() which is a static function and
> > > called from only one place, so, will it be required to check at both 
> > > places?
> >
> > There is a possibility that the initial check to verify if replication
> > origin exists in replorigin_drop_by_name was successful but later one
> > of either table sync worker or apply worker process might have dropped
> > the replication origin,
> >
> 
> Won't locking on the particular origin prevent concurrent drops? IIUC, the
> drop happens after the patch acquires the lock on the origin.

Yes, I think the existence check in replorigin_drop_guts is unnecessary as we
already lock the origin before that. I think the check in replorigin_drop_guts
is a custom check after calling SearchSysCache1 to get the tuple, but the error
should not happen as no concurrent drop can be performed.

To make it simpler, one idea is to move the code that getting the tuple from
system cache to the replorigin_drop_by_name(). After locking the origin, we
can try to get the tuple and do the existence check, and we can reuse
this tuple to perform origin delete. In this approach we only need to check
origin existence once after locking. BTW, if we do this, then we'd better 
rename the
replorigin_drop_guts() to something like replorigin_state_clear() as the 
function
only clear the in-memory information after that.

The code could be like:

---
replorigin_drop_by_name(const char *name, bool missing_ok, bool nowait)
...
/*
 * Lock the origin to prevent concurrent drops. We keep the lock until 
the
 * end of transaction.
 */
LockSharedObject(ReplicationOriginRelationId, roident, 0,
 AccessExclusiveLock);

tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
if (!HeapTupleIsValid(tuple))
{
if (!missing_ok)
elog(ERROR, "cache lookup failed for replication origin 
with ID %d",
 roident);

return;
}

replorigin_state_clear(rel, roident, nowait);

/*
 * Now, we can delete the catalog entry.
 */
CatalogTupleDelete(rel, >t_self);
ReleaseSysCache(tuple);

CommandCounterIncrement();
...
---

Best Regards,
Hou zj



Re: generate_series for timestamptz and time zone problem

2023-01-29 Thread Tom Lane
Gurjeet Singh  writes:
> On Sat, Nov 12, 2022 at 10:44 AM Tom Lane  wrote:
>> I looked over the v5 patch very briefly, and have two main
>> complaints:
>> ...

> Please see attached v6 of the patch.

Thanks for updating that!

> I'm not sure of the convention around authorship. But since this was
> not an insignificant amount of work, would this patch be considered as
> co-authored by Przemyslaw and myself? Should I add myself to Authors
> field in the Commitfest app?

While I'm not promising to commit this, if I were doing so I would
cite both of you as authors.  So feel free to change the CF entry.

regards, tom lane




Re: generate_series for timestamptz and time zone problem

2023-01-29 Thread Gurjeet Singh
On Sat, Nov 12, 2022 at 10:44 AM Tom Lane  wrote:
> However, what it shouldn't be is part of this patch.  It's worth
> pushing it separately to have a record of that decision.  I've
> now done that, so you'll need to rebase to remove that delta.
>
> I looked over the v5 patch very briefly, and have two main
> complaints:
>
> * There's no documentation additions.  You can't add a user-visible
> function without adding an appropriate entry to func.sgml.
>
> * I'm pretty unimpressed with the whole truncate-to-interval thing
> and would recommend you drop it.  I don't think it's adding much
> useful functionality beyond what we can already do with the existing
> date_trunc variants; and the definition seems excessively messy
> (too many restrictions and special cases).

Please see attached v6 of the patch.

The changes since v5 are:
.) Rebased and resolved conflicts caused by commit 533e02e92.
.) Removed code and tests related to new date_trunc() functions, as
suggested by Tom.
.) Added 3 more variants to accompany with date_add(tstz, interval, zone).
date_add(tstz, interval)
date_subtract(tstz, interval)
date_subtract(tstz, interval, zone)

.) Eliminate duplication of code; use common function to implement
generate_series_timestamptz[_at_zone]() functions.
.) Fixed bug where in one of the new code paths,
generate_series_timestamptz_with_zone(),  did not perform
TIMESTAMP_NOT_FINITE() check.
.) Replaced some DirectFunctionCall?() with direct calls to the
relevant *_internal() function; should be better for performance.
.) Added documentation all 5 functions (2 date_add(), 2
date_subtract(), 1 overloaded version of generate_series()).

I'm not sure of the convention around authorship. But since this was
not an insignificant amount of work, would this patch be considered as
co-authored by Przemyslaw and myself? Should I add myself to Authors
field in the Commitfest app?

Hi Przemyslaw,

I started working on this patch based on Tom's review a few days
ago, since you hadn't responded in a while, and I presumed you're not
working on this anymore. I should've consulted with/notified you of my
intent before starting to work on it, to avoid duplication of work.
Sorry if this submission obviates any work you have in progress.
Please feel free to provide your feedback on the v6 of the patch.

Best regards,
Gurjeet
http://Gurje.et


generate_series_with_timezone.v6.patch
Description: Binary data


Re: pub/sub - specifying optional parameters without values.

2023-01-29 Thread Peter Smith
On Mon, Jan 30, 2023 at 8:36 AM Tom Lane  wrote:
>
> Zheng Li  writes:
> > The behavior is due to the following code
> > https://github.com/postgres/postgres/blob/master/src/backend/commands/define.c#L113
>
> Yeah, so you can grep for places that have this behavior by looking
> for defGetBoolean calls ... and there are quite a few.  That leads
> me to the conclusion that we'd better invent a fairly stylized
> documentation solution that we can plug into a lot of places,
> rather than thinking of slightly different ways to say it and
> places to say it.  I'm not necessarily opposed to Peter's desire
> to fix replication-related commands first, but we have more to do
> later.
>
> I'm also not that thrilled with putting the addition up at the top
> of the relevant text.  This behavior is at least two decades old,
> so if we've escaped documenting it at all up to now, it can't be
> that important to most people.
>
> I also notice that ALTER SUBSCRIPTION has fully three different
> sub-sections with about equal claims on this note, if we're going
> to stick it directly into the affected option lists.
>
> That all leads me to propose that we add the new text at the end of
> the Parameters  in the affected man pages.  So about
> like the attached.  (I left out alter_publication.sgml, as I'm not
> sure it needs its own copy of this text --- it doesn't describe
> individual parameters at all, just refer to CREATE PUBLICATION.)
>

The v3 patch LGTM (just for the logical replication commands).

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-29 Thread Masahiko Sawada
On Mon, Jan 30, 2023 at 1:08 PM Dilip Kumar  wrote:
>
> On Thu, Jan 26, 2023 at 12:39 PM John Naylor
>  wrote:
> >
> >
> > On Tue, Jan 24, 2023 at 1:17 PM Dilip Kumar  wrote:
> > >
> > > On Mon, Jan 23, 2023 at 6:00 PM John Naylor
> > >  wrote:
> > > >
> > > > Attached is a rebase to fix conflicts from recent commits.
> > >
> > > I have reviewed v22-0022* patch and I have some comments.
> > >
> > > 1.
> > > >It also changes to the column names max_dead_tuples and num_dead_tuples 
> > > >and to
> > > >show the progress information in bytes.
> > >
> > > I think this statement needs to be rephrased.
> >
> > Could you be more specific?
>
> I mean the below statement in the commit message doesn't look
> grammatically correct to me.
>
> "It also changes to the column names max_dead_tuples and
> num_dead_tuples and to show the progress information in bytes."
>

I've changed the commit message in the v23 patch. Please check it.
Other comments are also incorporated in the v23 patch. Thank you for
the comments!

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-29 Thread Kyotaro Horiguchi
At Mon, 30 Jan 2023 11:56:33 +0530, Amit Kapila  wrote 
in 
> On Mon, Jan 30, 2023 at 9:43 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Mon, 30 Jan 2023 08:51:05 +0530, Amit Kapila  
> > wrote in
> > > On Mon, Jan 30, 2023 at 8:32 AM Kyotaro Horiguchi
> > >  wrote:
> > > >
> > > > At Sat, 28 Jan 2023 04:28:29 +, "Takamichi Osumi (Fujitsu)" 
> > > >  wrote in
> > > > > On Friday, January 27, 2023 8:00 PM Amit Kapila 
> > > > >  wrote:
> > > > > > So, you have changed min_apply_delay from int64 to int32, but you 
> > > > > > haven't
> > > > > > mentioned the reason for the same? We use 'int' for the similar 
> > > > > > parameter
> > > > > > recovery_min_apply_delay, so, ideally, it makes sense but still 
> > > > > > better to tell your
> > > > > > reason explicitly.
> > > > > Yes. It's because I thought I need to make this feature consistent 
> > > > > with the recovery_min_apply_delay.
> > > > > This feature handles the range same as the recovery_min_apply delay 
> > > > > from 0 to INT_MAX now
> > > > > so should be adjusted to match it.
> > > >
> > > > INT_MAX can stick out of int32 on some platforms. (I'm not sure where
> > > > that actually happens, though.) We can use PG_INT32_MAX instead.
> > > >
> > >
> > > But in other integer GUCs including recovery_min_apply_delay, we use
> > > INT_MAX, so not sure if it is a good idea to do something different
> > > here.
> >
> > The GUC is not stored in a catalog, but.. oh... it is multiplied by
> > 1000.
> 
> Which part of the patch you are referring to here? Isn't the check in

Where recovery_min_apply_delay is used. It is allowed to be set up to
INT_MAX but it is used as:

>   delayUntil = TimestampTzPlusMilliseconds(xtime, 
> recovery_min_apply_delay);

Where the macro is defined as:

> #define TimestampTzPlusMilliseconds(tz,ms) ((tz) + ((ms) * (int64) 1000))

Which can lead to overflow, which is practically harmless.

> the function defGetMinApplyDelay() sufficient to ensure that the
> 'delay' value stored in the catalog will always be lesser than
> INT_MAX?

I'm concerned about cases where INT_MAX is wider than int32.  If we
don't assume such cases, I'm fine with INT_MAX there.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Logical replication timeout problem

2023-01-29 Thread Amit Kapila
On Mon, Jan 30, 2023 at 10:36 AM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨  wrote:
> > On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com
> >  wrote:
>
> Yes, I think you are right.
> Fixed this problem.
>

+ /*
+ * Trying to send keepalive message after every change has some
+ * overhead, but testing showed there is no noticeable overhead if
+ * we do it after every ~100 changes.
+ */
+#define CHANGES_THRESHOLD 100
+
+ if (++changes_count < CHANGES_THRESHOLD)
+ return;
...
+ changes_count = 0;

I think it is better to have this threshold-related code in that
caller as we have in the previous version. Also, let's modify the
comment as follows:"
It is possible that the data is not sent to downstream for a long time
either because the output plugin filtered it or there is a DDL that
generates a lot of data that is not processed by the plugin. So, in
such cases, the downstream can timeout. To avoid that we try to send a
keepalive message if required. Trying to send a keepalive message
after every change has some overhead, but testing showed there is no
noticeable overhead if we do it after every ~100 changes."

-- 
With Regards,
Amit Kapila.




Re: Deadlock between logrep apply worker and tablesync worker

2023-01-29 Thread Amit Kapila
On Mon, Jan 30, 2023 at 9:20 AM vignesh C  wrote:
>
> On Sat, 28 Jan 2023 at 11:26, Amit Kapila  wrote:
> >
> > One thing that looks a bit odd is that we will anyway have a similar
> > check in replorigin_drop_guts() which is a static function and called
> > from only one place, so, will it be required to check at both places?
>
> There is a possibility that the initial check to verify if replication
> origin exists in replorigin_drop_by_name was successful but later one
> of either table sync worker or apply worker process might have dropped
> the replication origin,
>

Won't locking on the particular origin prevent concurrent drops? IIUC,
the drop happens after the patch acquires the lock on the origin.

-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-29 Thread Amit Kapila
On Mon, Jan 30, 2023 at 9:43 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 30 Jan 2023 08:51:05 +0530, Amit Kapila  
> wrote in
> > On Mon, Jan 30, 2023 at 8:32 AM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Sat, 28 Jan 2023 04:28:29 +, "Takamichi Osumi (Fujitsu)" 
> > >  wrote in
> > > > On Friday, January 27, 2023 8:00 PM Amit Kapila 
> > > >  wrote:
> > > > > So, you have changed min_apply_delay from int64 to int32, but you 
> > > > > haven't
> > > > > mentioned the reason for the same? We use 'int' for the similar 
> > > > > parameter
> > > > > recovery_min_apply_delay, so, ideally, it makes sense but still 
> > > > > better to tell your
> > > > > reason explicitly.
> > > > Yes. It's because I thought I need to make this feature consistent with 
> > > > the recovery_min_apply_delay.
> > > > This feature handles the range same as the recovery_min_apply delay 
> > > > from 0 to INT_MAX now
> > > > so should be adjusted to match it.
> > >
> > > INT_MAX can stick out of int32 on some platforms. (I'm not sure where
> > > that actually happens, though.) We can use PG_INT32_MAX instead.
> > >
> >
> > But in other integer GUCs including recovery_min_apply_delay, we use
> > INT_MAX, so not sure if it is a good idea to do something different
> > here.
>
> The GUC is not stored in a catalog, but.. oh... it is multiplied by
> 1000.

Which part of the patch you are referring to here? Isn't the check in
the function defGetMinApplyDelay() sufficient to ensure that the
'delay' value stored in the catalog will always be lesser than
INT_MAX?

-- 
With Regards,
Amit Kapila.




RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-29 Thread houzj.f...@fujitsu.com
On Thursday, January 26, 2023 11:37 AM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Followings are comments.

Thanks for the comments.

> In this test the rollback-prepared seems not to be executed. This is because
> serializations are finished while handling PREPARE message and the final
> state of transaction does not affect that, right? I think it may be helpful
> to add a one line comment.

Yes, but I am slightly unsure if it would be helpful to add this as we only 
test basic
cases(mainly for code coverage) for partial serialization.

> 
> 1. config.sgml
> 
> ```
> +the changes till logical_decoding_work_mem is reached. It can also
> be
> ```
> 
> I think it should be sandwiched by .

Added.

> 
> 2. config.sgml
> 
> ```
> +On the publisher side,
> logical_replication_mode allows
> +allows streaming or serializing changes immediately in logical
> decoding.
> ```
> 
> Typo "allows allows" -> "allows"

Fixed.

> 3. test general
> 
> You confirmed that the leader started to serialize changes, but did not ensure
> the endpoint.
> IIUC the parallel apply worker exits after applying serialized changes, and 
> it is
> not tested yet.
> Can we add polling the log somewhere?

I checked other tests and didn't find some examples where we test the exit of
apply worker or table sync worker. And if the parallel apply worker doesn't 
stop in
this case, we will fail anyway when reusing this worker to handle the next
transaction because the queue is broken. So, I prefer to keep the tests short.

> 4. 015_stream.pl
> 
> ```
> +is($result, qq(15000), 'all changes are replayed from file')
> ```
> 
> The statement may be unclear because changes can be also replicated when
> streaming = on.
> How about: "parallel apply worker replayed all changes from file"?

Changed.

Best regards,
Hou zj


RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-29 Thread houzj.f...@fujitsu.com
On Monday, January 30, 2023 12:13 PM Peter Smith  wrote:
> 
> Here are my review comments for v88-0002.

Thanks for your comments.

> 
> ==
> General
> 
> 1.
> The test cases are checking the log content but they are not checking for
> debug logs or untranslated elogs -- they are expecting a normal ereport LOG
> that might be translated. I’m not sure if that is OK, or if it is a potential 
> problem.

We have tests that check the ereport ERROR and ereport WARNING message(by
search for the ERROR or WARNING keyword for all the tap tests), so I think
checking the LOG should be fine.

> ==
> doc/src/sgml/config.sgml
> 
> 2.
> On the publisher side, logical_replication_mode allows allows streaming or
> serializing changes immediately in logical decoding. When set to immediate,
> stream each change if streaming option (see optional parameters set by
> CREATE SUBSCRIPTION) is enabled, otherwise, serialize each change. When set
> to buffered, the decoding will stream or serialize changes when
> logical_decoding_work_mem is reached.
> 
> 2a.
> typo "allows allows"  (Kuroda-san reported same)
> 
> 2b.
> "if streaming option" --> "if the streaming option"

Changed.

> ~~~
> 
> 3.
> On the subscriber side, if streaming option is set to parallel,
> logical_replication_mode also allows the leader apply worker to send changes
> to the shared memory queue or to serialize changes.
> 
> SUGGESTION
> On the subscriber side, if the streaming option is set to parallel,
> logical_replication_mode can be used to direct the leader apply worker to
> send changes to the shared memory queue or to serialize changes.

Changed.

> ==
> src/backend/utils/misc/guc_tables.c
> 
> 4.
>   {
>   {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
> - gettext_noop("Controls when to replicate each change."),
> - gettext_noop("On the publisher, it allows streaming or serializing each
> change in logical decoding."),
> + gettext_noop("Controls the internal behavior of logical replication
> publisher and subscriber"),
> + gettext_noop("On the publisher, it allows streaming or "
> + "serializing each change in logical decoding. On the "
> + "subscriber, in parallel streaming mode, it allows "
> + "the leader apply worker to serialize changes to "
> + "files and notifies the parallel apply workers to "
> + "read and apply them at the end of the transaction."),
>   GUC_NOT_IN_SAMPLE
>   },
> Suggest re-wording the long description (subscriber part) to be more like the
> documentation text.
> 
> BEFORE
> On the subscriber, in parallel streaming mode, it allows the leader apply 
> worker
> to serialize changes to files and notifies the parallel apply workers to read 
> and
> apply them at the end of the transaction.
> 
> SUGGESTION
> On the subscriber, if the streaming option is set to parallel, it directs the 
> leader
> apply worker to send changes to the shared memory queue or to serialize
> changes and apply them at the end of the transaction.
> 

Changed.

Attach the new version patch which addressed all comments so far (the v88-0001
has been committed, so we only have one remaining patch this time).

Best Regards,
Hou zj


v89-0001-Extend-the-logical_replication_mode-to-test-the-.patch
Description:  v89-0001-Extend-the-logical_replication_mode-to-test-the-.patch


Re: cutting down the TODO list thread

2023-01-29 Thread John Naylor
On Tue, Jan 24, 2023 at 11:57 PM Bruce Momjian  wrote:
>
> On Tue, Jan 24, 2023 at 10:46:34AM +0700, John Naylor wrote:
> >

> > https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F#TODOs

> > to:
> > "It's worth checking if the feature of interest is found in the TODO
list on
> > our wiki: http://wiki.postgresql.org/wiki/TODO. That list contains some
known
> > PostgreSQL bugs, some feature requests, and some things we are not even
sure we
> > want. Many entries have a link to an email thread containing prior
discussion,
> > or perhaps attempts that for whatever reason didn't make it as far as
getting
> > committed."
> >
> > ...which might make more sense if moved below the "brand new features"
section.
>
> I think we just point them at the TODO list and they will read the top
> of the list first, no?  I think you are right that we updated the top of
> the TODO but didn't update the places that link to it.  I am thinking we
> should just trim down the text linking to it and let the top of the TODO
> list do its job.

Okay. How about:

"It's worth checking if the feature of interest is found in the TODO list
on our wiki: http://wiki.postgresql.org/wiki/TODO. The entries there often
have additional information about the feature and may point to reasons why
it hasn't been implemented yet."

> > --
> > https://wiki.postgresql.org/wiki/Developer_FAQ
> >
> > 1)
> > from:
> > "What areas need work?
> > Outstanding features are detailed in Todo.
> >
> > You can learn more about these features by consulting the archives, the
SQL
> > standards and the recommended texts (see books for developers)."
> >
> > to:
> > ??? -> For "what areas need work?", we need to have a different answer,
but I'm
> > not sure what it is.
>
> Wow, I would not send a new person to the SQL standard docs.  ;-)  I am
> thinking we just don't have a good answer to this so let's say less.

Do I understand right that we could just remove this entire section "What
areas need work?"?

> > 2)
> > from:
> > "What do I do after choosing an item to work on?
> >
> > Send an email to pgsql-hackers with a proposal for what you want to do
> > (assuming your contribution is not trivial). Working in isolation is not
> > advisable because others might be working on the same TODO item, or you
might
> > have misunderstood the TODO item. In the email, discuss both the
internal
> > implementation method you plan to use, and any user-visible changes (new
> > syntax, etc)."
> >
> > to:
> > "What do I do after choosing an area to work on?
> >
> > Send an email to pgsql-hackers with a proposal for what you want to do
> > (assuming your contribution is not trivial). Working in isolation is not
>
> Can new people identify trivial?

I'd say they have some idea about that, since we do regularly get typo
fixes and doc clarifications. Sure there is some grey area, but I don't
think the dividing point is important. The important thing is, we also
sometimes get large and invasive patches without design discussion, which
we want to discourage.

> I can now see that just removing the [E] label totally is the right
> answer.  Yes, there might be an easy item on there, but the fact we have
> three labeled and they are not easy makes me thing [E] is causing more
> problems than it solves.

Okay, having heard no objections I'll remove it.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [DOCS] Stats views and functions not in order?

2023-01-29 Thread Peter Smith
On Fri, Jan 27, 2023 at 10:30 PM Peter Eisentraut
 wrote:
>
> On 19.01.23 00:45, Peter Smith wrote:
> > The original $SUBJECT requirements evolved to also try to make each
> > view appear on a separate page after that was suggested by DavidJ [2].
> > I was unable to achieve per-page views "without radically changing the
> > document structure." [3], but DavidJ found a way [4] to do it using
> > refentry. I then wrote the patch v8-0003 using that strategy, which
> > after more rebasing became the v10-0001 you see today.
> >
> > I did prefer the view-per-page results (although I also only use HTML
> > docs). But my worry is that there seem still to be a few unknowns
> > about how this might affect other (not the HTML) renderings of the
> > docs. If you think that risk is too great, or if you feel this patch
> > will cause unwarranted link/bookmark grief, then I am happy to just
> > drop it.
>
> I'm wary of making semantic markup changes to achieve an ad-hoc
> presentation effects.  Sometimes it's necessary, but it should be
> considered carefully and globally.
>
> We could change the chunking boundary to be sect2 globally.  This is
> easily configurable (chunk.section.depth).
>
> Thinking about it now, maybe this is what we need.  As the documentation
> grows, as it clearly does, the depth of the structure increases and
> pages get longer.  This can also be seen in other chapters.
>
> Of course, this would need to be tested and checked in more detail.
>

This chunk configuration idea sounds a better approach. If somebody
else wants to champion that change separately then I can maybe help to
review it.

Meanwhile, this pagination topic has strayed far away from the
original $SUBJECT, so I guess since there is nothing else pending this
thread's CF entry [1] can just be marked as "Committed" now?

--
[1] https://commitfest.postgresql.org/41/3904/

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-29 Thread Amit Kapila
On Mon, Jan 30, 2023 at 5:40 AM Peter Smith  wrote:
>
> Patch v88-0001 LGTM.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-29 Thread Amit Kapila
On Mon, Jan 30, 2023 at 10:27 AM Amit Kapila  wrote:
>
> On Sun, Jan 29, 2023 at 9:15 PM Masahiko Sawada  wrote:
> >
> > On Sat, Jan 28, 2023 at 11:54 PM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Amit, Sawada-san,
> > >
> > > I have also reproduced the failure on PG15 with some debug log, and I 
> > > agreed that
> > > somebody changed procArray->replication_slot_xmin to InvalidTransactionId.
> > >
> > > > > The same assertion failure has been reported on another thread[1].
> > > > > Since I could reproduce this issue several times in my environment
> > > > > I've investigated the root cause.
> > > > >
> > > > > I think there is a race condition of updating
> > > > > procArray->replication_slot_xmin by CreateInitDecodingContext() and
> > > > > LogicalConfirmReceivedLocation().
> > > > >
> > > > > What I observed in the test was that a walsender process called:
> > > > > SnapBuildProcessRunningXacts()
> > > > >   LogicalIncreaseXminForSlot()
> > > > > LogicalConfirmReceivedLocation()
> > > > >   ReplicationSlotsComputeRequiredXmin(false).
> > > > >
> > > > > In ReplicationSlotsComputeRequiredXmin() it acquired the
> > > > > ReplicationSlotControlLock and got 0 as the minimum xmin since there
> > > > > was no wal sender having effective_xmin.
> > > > >
> > > >
> > > > What about the current walsender process which is processing
> > > > running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
> > > > slot's effective_xmin have a non-zero value? If not, then why?
> > >
> > > Normal walsenders which are not for tablesync create a replication slot 
> > > with
> > > NOEXPORT_SNAPSHOT option. I think in this case, 
> > > CreateInitDecodingContext() is
> > > called with need_full_snapshot = false, and slot->effective_xmin is not 
> > > updated.
> >
> > Right. This is how we create a slot used by an apply worker.
> >
>
> I was thinking about how that led to this problem because
> GetOldestSafeDecodingTransactionId() ignores InvalidTransactionId.
>

I have reproduced it manually. For this, I had to manually make the
debugger call ReplicationSlotsComputeRequiredXmin(false) via path
SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot()->LogicalConfirmReceivedLocation()
->ReplicationSlotsComputeRequiredXmin(false) for the apply worker. The
sequence of events is something like (a) the replication_slot_xmin for
tablesync worker is overridden by apply worker as zero as explained in
Sawada-San's email, (b) another transaction happened on the publisher
that will increase the value of ShmemVariableCache->nextXid (c)
tablesync worker invokes
SnapBuildInitialSnapshot()->GetOldestSafeDecodingTransactionId() which
will return an oldestSafeXid which is higher than snapshot's xmin.
This happens because replication_slot_xmin has an InvalidTransactionId
value and we won't consider replication_slot_catalog_xmin because
catalogOnly flag is false and there is no other open running
transaction. I think we should try to get a simplified test to
reproduce this problem if possible.

-- 
With Regards,
Amit Kapila.




Re: Something is wrong with wal_compression

2023-01-29 Thread Michael Paquier
On Sat, Jan 28, 2023 at 12:02:23AM -0500, Tom Lane wrote:
> My thoughts were trending in that direction too.  It's starting
> to sound like we aren't going to be able to make a fix that
> we'd be willing to risk back-patching, even if it were completely
> compatible at the user level.
> 
> Still, the idea that txid_status() isn't trustworthy is rather
> scary.  I wonder whether there is a failure mode here that's
> exhibitable without using that.

Okay, as far as I can see, the consensus would be to not do anything
about the performance impact of these functions:
20210305.115011.558061052471425531.horikyota@gmail.com

Three of my buildfarm machines are unstable because of that, they need
something for stable branches as well, and I'd like them to stress
their options.

Based on what's been mentioned, we can:
1) tweak the test with an extra checkpoint to make sure that the XIDs
are flushed, like in the patch posted on [1].
2) tweak the test to rely on a state of the table, as
mentioned by Andrey.
3) remove entirely the test, because as introduced it does not
actually test what it should.

2) is not really interesting, IMO, because the test checks for two
things:
- an in-progress XID, which we already do in the main regression test
suite.
- a post-crash state, and switching to an approach where some data is
for example scanned is no different than a lot of the other recovery
tests.

1) means more test cycles, and perhaps we could enforce compression of
WAL while on it?  At the end, my vote would just go for 3) and drop
the whole scenario, though there may be an argument in 1).

[1]: 
https://www.postgresql.org/message-id/20210305.115011.558061052471425531.horikyota@gmail.com
--
Michael


signature.asc
Description: PGP signature


Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2023-01-29 Thread Bharath Rupireddy
On Tue, Nov 29, 2022 at 10:45 PM SATYANARAYANA NARLAPURAM
 wrote:
>
> On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM 
>  wrote:
>>
>> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian  wrote:
>>>
>>> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
>>> > 2. Process proc die immediately when a backend is waiting for sync
>>> > replication acknowledgement, as it does today, however, upon restart,
>>> > don't open up for business (don't accept ready-only connections)
>>> > unless the sync standbys have caught up.
>>> >
>>> > Are you planning to block connections or queries to the database? It 
>>> > would be
>>> > good to allow connections and let them query the monitoring views but 
>>> > block the
>>> > queries until sync standby have caught up. Otherwise, this leaves a 
>>> > monitoring
>>> > hole. In cloud, I presume superusers are allowed to connect and monitor 
>>> > (end
>>> > customers are not the role members and can't query the data). The same 
>>> > can't be
>>> > true for all the installations. Could you please add more details on your
>>> > approach?
>>>
>>> I think ALTER SYSTEM should be allowed, particularly so you can modify
>>> synchronous_standby_names, no?
>>
>> Yes, Change in synchronous_standby_names is expected in this situation. 
>> IMHO, blocking all the connections is not a recommended approach.
>
> How about allowing superusers (they can still read locally committed data) 
> and users part of pg_monitor role?

I started to spend time on this feature again. Thanks all for your
comments so far.

Per latest comments, it looks like we're mostly okay to emit a warning
and ignore query cancel interrupts while waiting for sync replication
ACK.

For proc die, it looks like the suggestion was to process it
immediately and upon next restart, don't allow user connections unless
all sync standbys were caught up. However, we need to be able to allow
replication connections from standbys so that they'll be able to
stream the needed WAL and catch up with primary, allow superuser or
users with pg_monitor role to connect to perform ALTER SYSTEM to
remove the unresponsive sync standbys if any from the list or disable
sync replication altogether or monitor for flush lsn/catch up status.
And block all other connections. Note that replication, superuser and
users with pg_monitor role connections are allowed only after the
server reaches a consistent state not before that to not read any
inconsistent data.

The trickiest part of doing the above is how we detect upon restart
that the server received proc die while waiting for sync replication
ACK. One idea might be to set a flag in the control file before the
crash. Second idea might be to write a marker file (although I don't
favor this idea); presence indicates that the server was waiting for
sync replication ACK before the crash. However, we may not detect all
sorts of crashes in a backend when it is waiting for sync replication
ACK to do any of these two ideas. Therefore, this may not be a
complete solution.

Third idea might be to just let the primary wait for sync standbys to
catch up upon restart irrespective of whether it was crashed or not
while waiting for sync replication ACK. While this idea works well
without having to detect all sorts of crashes, the primary may not
come up if any unresponsive standbys are present (currently, the
primary continues to be operational for read-only queries at least
irrespective of whether sync standbys have caught up or not).

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Andres Freund
Hi,

On 2023-01-30 15:22:34 +1300, Thomas Munro wrote:
> On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro  wrote:
> > out-of-order hazard
>
> I've been trying to understand how that could happen, but my CPU-fu is
> weak.  Let me try to write an argument for why it can't happen, so
> that later I can look back at how stupid and naive I was.  We have A
> B, and if the CPU sees no dependency and decides to execute B A
> (pipelined), shouldn't an interrupt either wait for the whole
> schemozzle to commit first (if not in a hurry), or nuke it, handle the
> IPI and restart, or something?

In a core local view, yes, I think so. But I don't think that's how it can
work on multi-core, and even more so, multi-socket machines. Imagine how it'd
influence latency if every interrupt on any CPU would prevent all out-of-order
execution on any CPU.


> After an hour of reviewing randoma
> slides from classes on out-of-order execution and reorder buffers and
> the like, I think the term for making sure that interrupts run with
> the illusion of in-order execution maintained is called "precise
> interrupts", and it is expected in all modern architectures, after the
> early OoO pioneers lost their minds trying to program without it.  I
> guess generally you want that because it would otherwise run your
> interrupt handler in a completely uncertain environment, and
> specifically in this case it would reach our signal handler which
> reads A's output (waiting) and writes to B's input (is_set), so B IPI
> A surely shouldn't be allowed?

Userspace signals aren't delivered synchronously during hardware interrupts
afaik - and I don't think they even possibly could be (after all the process
possibly isn't scheduled).

I think what you're talking about with precise interrupts above is purely
about the single-core view, and mostly about hardware interrupts for faults
etc. The CPU will unwind state from speculatively executed code etc on
interrupt, sure - but I think that's separate from guaranteeing that you can't
have stale cache contents *due to work by another CPU*.


I'm not even sure that userspace signals are generally delivered via an
immediate hardware interrupt, or whether they're processed at the next
scheduler tick. After all, we know that multiple signals are coalesced, which
certainly isn't compatible with synchronous execution. But it could be that
that just happens when the target of a signal is not currently scheduled.


> Maybe it's a much dumber sort of a concurrency problem: stale cache
> line due to missing barrier, but... commit db0f6cad488 made us also
> set our own latch (a second time) when someone sets our latch in
> releases 9.something to 13.

But this part does indeed put a crimp on some potential theories.

TBH, I'd be in favor of just adding the barriers for good measure, even if we
don't know if it's a live bug today - it seems incredibly fragile.

Greetings,

Andres Freund




Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-01-29 Thread John Naylor
On Sun, Jan 29, 2023 at 7:05 PM Ankit Kumar Pandey 
wrote:
>
>
> > On 26/01/23 07:40, David Rowley wrote:

> > Sorting in smaller batches that better fit into CPU cache.
>
> More reading yielded that we are looking for cache-oblivious
> sorting algorithm.

Since David referred to L3 size as the starting point of a possible
configuration parameter, that's actually cache-conscious.

> One the papers[1] mentions that in quick sort, whenever we reach size
which can fit in cache,
> instead of partitioning it further, we can do insertion sort there itself.

> >  Memory-tuned quicksort uses insertion sort to sort small subsets while
they are in
> > the cache, instead of partitioning further. This increases the
instruction count,
> > but reduces the cache misses
>
> If this looks step in right direction, I can give it a try and do more
reading and experimentation.

I'm not close enough to this thread to guess at the right direction
(although I hope related work will help), but I have a couple general
remarks:

1. In my experience, academic papers like to test sorting with
register-sized numbers or strings. Our sorttuples are bigger, have
complex comparators, and can fall back to fields in the full tuple.
2. That paper is over 20 years old. If it demonstrated something genuinely
useful, some of those concepts would likely be implemented in the
real-world somewhere. Looking for evidence of that might be a good exercise.
3. 20 year-old results may not carry over to modern hardware.
4. Open source software is more widespread in the academic world now than
20 years ago, so papers with code (maybe even the author's github) are much
more useful to us in my view.
5. It's actually not terribly hard to make sorting faster for some specific
cases -- the hard part is keeping other inputs of interest from regressing.
6. The bigger the change, the bigger the risk of regressing somewhere.

--
John Naylor
EDB: http://www.enterprisedb.com


RE: Logical replication timeout problem

2023-01-29 Thread wangw.f...@fujitsu.com
On Mon, Jan 30, 2023 11:37 AM Shi, Yu/侍 雨  wrote:
> On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com
>  wrote:
> >
> > I tested a mix transaction of transactional and non-transactional messages 
> > on
> > the current HEAD and reproduced the timeout problem. I think this result is
> OK.
> > Because when decoding a transaction, non-transactional changes are
> processed
> > directly and the function WalSndKeepaliveIfNecessary is called, while
> > transactional changes are cached and processed after decoding. After
> decoding,
> > only transactional changes will be processed (in the function
> > ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> >
> > After applying the v8 patch, the test mentioned above didn't reproduce the
> > timeout problem (Attach this test script 'test_with_nontransactional.sh').
> >
> > Attach the new patch.
> >
> 
> Thanks for updating the patch. Here is a comment.

Thanks for your comment.

> In update_progress_txn_cb_wrapper(), it looks like we need to reset
> changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
> OutputPluginUpdateProgress() will always be called after 100 changes.

Yes, I think you are right.
Fixed this problem.

Attach the new patch.

Regards,
Wang Wei


v9-0001-Fix-the-logical-replication-timeout-during-proces.patch
Description:  v9-0001-Fix-the-logical-replication-timeout-during-proces.patch


Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-01-29 Thread Michael Paquier
On Sun, Jan 29, 2023 at 01:05:07PM -0500, Tom Lane wrote:
> I kind of think this is a lot of unnecessary work.  The case that is
> problematic is a GUC that's marked GUC_NO_SHOW_ALL but not marked
> GUC_NOT_IN_SAMPLE.  There aren't any of those, and I don't think there
> are likely to be any in future, because it doesn't make a lot of sense.
> Why don't we just make a policy against doing that, and enforce it
> with an assertion somewhere in GUC initialization?

[..thinks..]

Looking at guc.sql, I think that these is a second mistake: the test
checks for (no_show_all AND !no_reset_all) but this won't work 
because NO_SHOW_ALL GUCs cannot be scanned via SQL.  There are two
parameters that include this combination of flags: default_with_oids
and ssl_renegotiation_limit, so the check would not do what it
should.  I think that this part should be removed.

For the second part to prevent GUCs to be marked as NO_SHOW_ALL &&
!NOT_IN_SAMPLE, check_GUC_init() looks like the right place to me,
because this routine has been designed exactly for this purpose.

So, what do you think about the attached?
--
Michael
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 978b385568..93da5d24ba 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1383,11 +1383,14 @@ check_GUC_name_for_parameter_acl(const char *name)
 }
 
 /*
- * Routine in charge of checking that the initial value of a GUC is the
- * same when declared and when loaded to prevent anybody looking at the
- * C declarations of these GUCS from being fooled by mismatched values.
+ * Routine in charge of checking various states of a GUC.
  *
- * The following validation rules apply:
+ * This performs two sanity checks.  First, it checks that the initial
+ * value of a GUC is the same when declared and when loaded to prevent
+ * anybody looking at the C declarations of these GUCS from being fooled by
+ * mismatched values.  Second, it checks for incorrect flag combinations.
+ *
+ * The following validation rules apply for the values:
  * bool - can be false, otherwise must be same as the boot_val
  * int  - can be 0, otherwise must be same as the boot_val
  * real - can be 0.0, otherwise must be same as the boot_val
@@ -1398,6 +1401,7 @@ check_GUC_name_for_parameter_acl(const char *name)
 static bool
 check_GUC_init(struct config_generic *gconf)
 {
+	/* Checks on values */
 	switch (gconf->vartype)
 	{
 		case PGC_BOOL:
@@ -1462,6 +1466,16 @@ check_GUC_init(struct config_generic *gconf)
 			}
 	}
 
+	/* Flag combinations */
+	/* GUC_NO_SHOW_ALL requires GUC_NOT_IN_SAMPLE */
+	if ((gconf->flags & GUC_NO_SHOW_ALL) &&
+		!(gconf->flags & GUC_NOT_IN_SAMPLE))
+	{
+		elog(LOG, "GUC %s flags: NO_SHOW_ALL and !NOT_IN_SAMPLE",
+			 gconf->name);
+		return false;
+	}
+
 	return true;
 }
 #endif
diff --git a/src/test/regress/expected/guc.out b/src/test/regress/expected/guc.out
index 2914b950b4..3185ec0063 100644
--- a/src/test/regress/expected/guc.out
+++ b/src/test/regress/expected/guc.out
@@ -880,15 +880,7 @@ SELECT name FROM tab_settings_flags
 --
 (0 rows)
 
--- NO_SHOW_ALL implies NO_RESET_ALL, and vice-versa.
-SELECT name FROM tab_settings_flags
-  WHERE no_show_all AND NOT no_reset_all
-  ORDER BY 1;
- name 
---
-(0 rows)
-
--- Exceptions are transaction_*.
+-- NO_RESET_ALL can be specified without NO_SHOW_ALL, like transaction_*.
 SELECT name FROM tab_settings_flags
   WHERE NOT no_show_all AND no_reset_all
   ORDER BY 1;
@@ -899,14 +891,6 @@ SELECT name FROM tab_settings_flags
  transaction_read_only
 (3 rows)
 
--- NO_SHOW_ALL implies NOT_IN_SAMPLE.
-SELECT name FROM tab_settings_flags
-  WHERE no_show_all AND NOT not_in_sample
-  ORDER BY 1;
- name 
---
-(0 rows)
-
 -- NO_RESET implies NO_RESET_ALL.
 SELECT name FROM tab_settings_flags
   WHERE no_reset AND NOT no_reset_all
diff --git a/src/test/regress/sql/guc.sql b/src/test/regress/sql/guc.sql
index d40d0c178f..f672b88b54 100644
--- a/src/test/regress/sql/guc.sql
+++ b/src/test/regress/sql/guc.sql
@@ -349,18 +349,10 @@ SELECT name FROM tab_settings_flags
 SELECT name FROM tab_settings_flags
   WHERE category = 'Preset Options' AND NOT not_in_sample
   ORDER BY 1;
--- NO_SHOW_ALL implies NO_RESET_ALL, and vice-versa.
-SELECT name FROM tab_settings_flags
-  WHERE no_show_all AND NOT no_reset_all
-  ORDER BY 1;
--- Exceptions are transaction_*.
+-- NO_RESET_ALL can be specified without NO_SHOW_ALL, like transaction_*.
 SELECT name FROM tab_settings_flags
   WHERE NOT no_show_all AND no_reset_all
   ORDER BY 1;
--- NO_SHOW_ALL implies NOT_IN_SAMPLE.
-SELECT name FROM tab_settings_flags
-  WHERE no_show_all AND NOT not_in_sample
-  ORDER BY 1;
 -- NO_RESET implies NO_RESET_ALL.
 SELECT name FROM tab_settings_flags
   WHERE no_reset AND NOT no_reset_all


signature.asc
Description: PGP signature


Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-29 Thread Amit Kapila
On Sun, Jan 29, 2023 at 9:15 PM Masahiko Sawada  wrote:
>
> On Sat, Jan 28, 2023 at 11:54 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Amit, Sawada-san,
> >
> > I have also reproduced the failure on PG15 with some debug log, and I 
> > agreed that
> > somebody changed procArray->replication_slot_xmin to InvalidTransactionId.
> >
> > > > The same assertion failure has been reported on another thread[1].
> > > > Since I could reproduce this issue several times in my environment
> > > > I've investigated the root cause.
> > > >
> > > > I think there is a race condition of updating
> > > > procArray->replication_slot_xmin by CreateInitDecodingContext() and
> > > > LogicalConfirmReceivedLocation().
> > > >
> > > > What I observed in the test was that a walsender process called:
> > > > SnapBuildProcessRunningXacts()
> > > >   LogicalIncreaseXminForSlot()
> > > > LogicalConfirmReceivedLocation()
> > > >   ReplicationSlotsComputeRequiredXmin(false).
> > > >
> > > > In ReplicationSlotsComputeRequiredXmin() it acquired the
> > > > ReplicationSlotControlLock and got 0 as the minimum xmin since there
> > > > was no wal sender having effective_xmin.
> > > >
> > >
> > > What about the current walsender process which is processing
> > > running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
> > > slot's effective_xmin have a non-zero value? If not, then why?
> >
> > Normal walsenders which are not for tablesync create a replication slot with
> > NOEXPORT_SNAPSHOT option. I think in this case, CreateInitDecodingContext() 
> > is
> > called with need_full_snapshot = false, and slot->effective_xmin is not 
> > updated.
>
> Right. This is how we create a slot used by an apply worker.
>

I was thinking about how that led to this problem because
GetOldestSafeDecodingTransactionId() ignores InvalidTransactionId. It
seems that is possible when both builder->xmin and
replication_slot_catalog_xmin precede replication_slot_catalog_xmin.
Do you see any different reason for it?

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-29 Thread John Naylor
On Sun, Jan 29, 2023 at 9:50 PM Masahiko Sawada 
wrote:
>
> On Sat, Jan 28, 2023 at 8:33 PM John Naylor
>  wrote:

> > The first implementation should be simple, easy to test/verify, easy to
understand, and easy to replace. As much as possible anyway.
>
> Yes, but if a concurrent writer waits for another process to finish
> the iteration, it ends up waiting on a lwlock, which is not
> interruptible.
>
> >
> > > So the idea is that we set iter_active to true (with the
> > > lock in exclusive mode), and prevent concurrent updates when the flag
> > > is true.
> >
> > ...by throwing elog(ERROR)? I'm not so sure users of this API would
prefer that to waiting.
>
> Right. I think if we want to wait rather than an ERROR, the waiter
> should wait in an interruptible way, for example, a condition
> variable. I did a simpler way in the v22 patch.
>
> ...but looking at dshash.c, dshash_seq_next() seems to return an entry
> while holding a lwlock on the partition. My assumption might be wrong.

Using partitions there makes holding a lock less painful on average, I
imagine, but I don't know the details there.

If we make it clear that the first committed version is not (yet) designed
for high concurrency with mixed read-write workloads, I think waiting (as a
protocol) is fine. If waiting is a problem for some use case, at that point
we should just go all the way and replace the locking entirely. In fact, it
might be good to spell this out in the top-level comment and include a link
to the second ART paper.

> > [thinks some more...] Is there an API-level assumption that hasn't been
spelled out? Would it help to have a parameter for whether the iteration
function wants to reserve the privilege to perform writes? It could take
the appropriate lock at the start, and there could then be multiple
read-only iterators, but only one read/write iterator. Note, I'm just
guessing here, and I don't want to make things more difficult for future
improvements.
>
> Seems a good idea. Given the use case for parallel heap vacuum, it
> would be a good idea to support having multiple read-only writers. The
> iteration of the v22 is read-only, so if we want to support read-write
> iterator, we would need to support a function that modifies the
> current key-value returned by the iteration.

Okay, so updating during iteration is not currently supported. It could in
the future, but I'd say that can also wait for fine-grained concurrency
support. Intermediate-term, we should at least make it straightforward to
support:

1) parallel heap vacuum  -> multiple read-only iterators
2) parallel heap pruning -> multiple writers

It may or may not be worth it for someone to actually start either of those
projects, and there are other ways to improve vacuum that may be more
pressing. That said, it seems the tid store with global locking would
certainly work fine for #1 and maybe "not too bad" for #2. #2 can also
mitigate waiting by using larger batching, or the leader process could
"pre-warm" the tid store with zero-values using block numbers from the
visibility map.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Deadlock between logrep apply worker and tablesync worker

2023-01-29 Thread vignesh C
On Fri, 27 Jan 2023 at 17:46, Amit Kapila  wrote:
>
> On Fri, Jan 27, 2023 at 3:45 PM vignesh C  wrote:
> >
> > On Mon, 23 Jan 2023 at 10:52, Amit Kapila  wrote:
> > >
> > > IIRC, this is done to prevent concurrent drops of origin drop say by
> > > exposed API pg_replication_origin_drop(). See the discussion in [1]
> > > related to it. If we want we can optimize it so that we can acquire
> > > the lock on the specific origin as mentioned in comments
> > > replorigin_drop_by_name() but it was not clear that this operation
> > > would be frequent enough.
> >
> > Here is an attached patch to lock the replication origin record using
> > LockSharedObject instead of locking pg_replication_origin relation in
> > ExclusiveLock mode. Now tablesync worker will wait only if the
> > tablesync worker is trying to drop the same replication origin which
> > has already been dropped by the apply worker, the other tablesync
> > workers will be able to successfully drop the replication origin
> > without any wait.
> >
>
> There is a code in the function replorigin_drop_guts() that uses the
> functionality introduced by replorigin_exists(). Can we reuse this
> function for the same?
>
> Also, it would be good if you can share the numbers for different runs
> of "src/test/subscription/t/002_types.pl" before and after the patch.

By using only Tom Lane's Fix, I noticed that the execution time is
varying between 3.6 seconds to 4.4 seconds.
By using only the "Changing to LockSharedObject" fix, I noticed that
the execution time is varying between 3.8 seconds to 4.6 seconds.
By using the combined Fix(Tom Lane's fix + changing to
LockSharedObject) , I noticed that the execution time is varying
between 3.5 seconds to 3.8 seconds.
I felt both the changes will be required as it will also handle the
scenario when both the apply worker and the table sync worker try to
drop the same replication origin.

The execution results for the same:
With only Tom Lane's fix:
[12:25:32] t/002_types.pl .. ok 3604 ms ( 0.00 usr  0.00 sys +
2.26 cusr  0.37 csys =  2.63 CPU)
[12:25:48] t/002_types.pl .. ok 3788 ms ( 0.00 usr  0.00 sys +
2.24 cusr  0.39 csys =  2.63 CPU)
[12:26:01] t/002_types.pl .. ok 3783 ms ( 0.00 usr  0.00 sys +
2.42 cusr  0.37 csys =  2.79 CPU)
[12:26:14] t/002_types.pl .. ok 3845 ms ( 0.00 usr  0.00 sys +
2.38 cusr  0.44 csys =  2.82 CPU)
[12:26:29] t/002_types.pl .. ok 3923 ms ( 0.00 usr  0.00 sys +
2.54 cusr  0.39 csys =  2.93 CPU)
[12:26:42] t/002_types.pl .. ok 4416 ms ( 0.00 usr  0.00 sys +
2.73 cusr  0.48 csys =  3.21 CPU)
[12:26:55] t/002_types.pl .. ok 4310 ms ( 0.00 usr  0.00 sys +
2.62 cusr  0.39 csys =  3.01 CPU)
[12:27:09] t/002_types.pl .. ok 4168 ms ( 0.00 usr  0.00 sys +
2.67 cusr  0.46 csys =  3.13 CPU)
[12:27:21] t/002_types.pl .. ok 4167 ms ( 0.00 usr  0.00 sys +
2.46 cusr  0.53 csys =  2.99 CPU)
[12:27:34] t/002_types.pl .. ok 4144 ms ( 0.00 usr  0.00 sys +
2.59 cusr  0.41 csys =  3.00 CPU)
[12:27:46] t/002_types.pl .. ok 3982 ms ( 0.00 usr  0.00 sys +
2.52 cusr  0.41 csys =  2.93 CPU)
[12:28:03] t/002_types.pl .. ok 4190 ms ( 0.01 usr  0.00 sys +
2.67 cusr  0.46 csys =  3.14 CPU)

With only "Changing to LockSharedObject" fix:
[12:33:02] t/002_types.pl .. ok 3815 ms ( 0.00 usr  0.00 sys +
2.30 cusr  0.38 csys =  2.68 CPU)
[12:33:16] t/002_types.pl .. ok 4295 ms ( 0.00 usr  0.00 sys +
2.66 cusr  0.42 csys =  3.08 CPU)
[12:33:31] t/002_types.pl .. ok 4270 ms ( 0.00 usr  0.00 sys +
2.72 cusr  0.44 csys =  3.16 CPU)
[12:33:44] t/002_types.pl .. ok 4460 ms ( 0.00 usr  0.00 sys +
2.78 cusr  0.45 csys =  3.23 CPU)
[12:33:58] t/002_types.pl .. ok 4340 ms ( 0.01 usr  0.00 sys +
2.67 cusr  0.45 csys =  3.13 CPU)
[12:34:11] t/002_types.pl .. ok 4142 ms ( 0.00 usr  0.00 sys +
2.58 cusr  0.42 csys =  3.00 CPU)
[12:34:24] t/002_types.pl .. ok 4459 ms ( 0.00 usr  0.00 sys +
2.76 cusr  0.49 csys =  3.25 CPU)
[12:34:38] t/002_types.pl .. ok 4427 ms ( 0.00 usr  0.00 sys +
2.68 cusr  0.48 csys =  3.16 CPU)
[12:35:10] t/002_types.pl .. ok 4642 ms ( 0.00 usr  0.00 sys +
2.84 cusr  0.55 csys =  3.39 CPU)
[12:35:22] t/002_types.pl .. ok 4047 ms ( 0.01 usr  0.00 sys +
2.49 cusr  0.46 csys =  2.96 CPU)
[12:35:32] t/002_types.pl .. ok 4505 ms ( 0.01 usr  0.00 sys +
2.90 cusr  0.45 csys =  3.36 CPU)
[12:36:03] t/002_types.pl .. ok 4088 ms ( 0.00 usr  0.00 sys +
2.51 cusr  0.42 csys =  2.93 CPU)

002_types with combination of Tom Lane's and "Changing to LockSharedObject" fix:
[10:22:04] t/002_types.pl .. ok 3730 ms ( 0.00 usr  0.00 sys +
2.30 cusr  0.41 csys =  2.71 CPU)
[10:23:40] t/002_types.pl .. ok 3666 ms ( 0.00 usr  0.00 sys +
2.16 cusr  0.42 csys =  2.58 CPU)
[10:23:31] t/002_types.pl .. ok 3665 ms ( 0.00 usr  0.00 sys +
2.31 cusr  0.40 csys =  2.71 CPU)
[10:23:23] t/002_types.pl .. ok 3500 ms ( 0.00 usr  0.00 sys +
2.20 cusr  0.36 csys =  2.56 CPU)
[10:23:14] t/002_types.pl .. ok 3704 ms ( 0.00 usr  0.00 sys +
2.36 

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-29 Thread Kyotaro Horiguchi
At Mon, 30 Jan 2023 08:51:05 +0530, Amit Kapila  wrote 
in 
> On Mon, Jan 30, 2023 at 8:32 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Sat, 28 Jan 2023 04:28:29 +, "Takamichi Osumi (Fujitsu)" 
> >  wrote in
> > > On Friday, January 27, 2023 8:00 PM Amit Kapila  
> > > wrote:
> > > > So, you have changed min_apply_delay from int64 to int32, but you 
> > > > haven't
> > > > mentioned the reason for the same? We use 'int' for the similar 
> > > > parameter
> > > > recovery_min_apply_delay, so, ideally, it makes sense but still better 
> > > > to tell your
> > > > reason explicitly.
> > > Yes. It's because I thought I need to make this feature consistent with 
> > > the recovery_min_apply_delay.
> > > This feature handles the range same as the recovery_min_apply delay from 
> > > 0 to INT_MAX now
> > > so should be adjusted to match it.
> >
> > INT_MAX can stick out of int32 on some platforms. (I'm not sure where
> > that actually happens, though.) We can use PG_INT32_MAX instead.
> >
> 
> But in other integer GUCs including recovery_min_apply_delay, we use
> INT_MAX, so not sure if it is a good idea to do something different
> here.

The GUC is not stored in a catalog, but.. oh... it is multiplied by
1000. So if it is larger than (INT_MAX / 1000), it overflows...  If we
officially accept that (I don't think great) behavior (even only for
impractical values), I don't object further.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-29 Thread Peter Smith
Here are my review comments for v88-0002.

==
General

1.
The test cases are checking the log content but they are not checking
for debug logs or untranslated elogs -- they are expecting a normal
ereport LOG that might be translated. I’m not sure if that is OK, or
if it is a potential problem.

==
doc/src/sgml/config.sgml

2.
On the publisher side, logical_replication_mode allows allows
streaming or serializing changes immediately in logical decoding. When
set to immediate, stream each change if streaming option (see optional
parameters set by CREATE SUBSCRIPTION) is enabled, otherwise,
serialize each change. When set to buffered, the decoding will stream
or serialize changes when logical_decoding_work_mem is reached.

2a.
typo "allows allows"  (Kuroda-san reported same)

2b.
"if streaming option" --> "if the streaming option"

~~~

3.
On the subscriber side, if streaming option is set to parallel,
logical_replication_mode also allows the leader apply worker to send
changes to the shared memory queue or to serialize changes.

SUGGESTION
On the subscriber side, if the streaming option is set to parallel,
logical_replication_mode can be used to direct the leader apply worker
to send changes to the shared memory queue or to serialize changes.

==
src/backend/utils/misc/guc_tables.c

4.
  {
  {"logical_replication_mode", PGC_USERSET, DEVELOPER_OPTIONS,
- gettext_noop("Controls when to replicate each change."),
- gettext_noop("On the publisher, it allows streaming or serializing
each change in logical decoding."),
+ gettext_noop("Controls the internal behavior of logical replication
publisher and subscriber"),
+ gettext_noop("On the publisher, it allows streaming or "
+ "serializing each change in logical decoding. On the "
+ "subscriber, in parallel streaming mode, it allows "
+ "the leader apply worker to serialize changes to "
+ "files and notifies the parallel apply workers to "
+ "read and apply them at the end of the transaction."),
  GUC_NOT_IN_SAMPLE
  },
Suggest re-wording the long description (subscriber part) to be more
like the documentation text.

BEFORE
On the subscriber, in parallel streaming mode, it allows the leader
apply worker to serialize changes to files and notifies the parallel
apply workers to read and apply them at the end of the transaction.

SUGGESTION
On the subscriber, if the streaming option is set to parallel, it
directs the leader apply worker to send changes to the shared memory
queue or to serialize changes and apply them at the end of the
transaction.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-29 Thread Dilip Kumar
On Thu, Jan 26, 2023 at 12:39 PM John Naylor
 wrote:
>
>
> On Tue, Jan 24, 2023 at 1:17 PM Dilip Kumar  wrote:
> >
> > On Mon, Jan 23, 2023 at 6:00 PM John Naylor
> >  wrote:
> > >
> > > Attached is a rebase to fix conflicts from recent commits.
> >
> > I have reviewed v22-0022* patch and I have some comments.
> >
> > 1.
> > >It also changes to the column names max_dead_tuples and num_dead_tuples 
> > >and to
> > >show the progress information in bytes.
> >
> > I think this statement needs to be rephrased.
>
> Could you be more specific?

I mean the below statement in the commit message doesn't look
grammatically correct to me.

"It also changes to the column names max_dead_tuples and
num_dead_tuples and to show the progress information in bytes."

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Deadlock between logrep apply worker and tablesync worker

2023-01-29 Thread vignesh C
On Sat, 28 Jan 2023 at 11:26, Amit Kapila  wrote:
>
> On Sat, Jan 28, 2023 at 9:36 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Friday, January 27, 2023 8:16 PM Amit Kapila 
> > >
> > > On Fri, Jan 27, 2023 at 3:45 PM vignesh C  wrote:
> > > >
> > > > On Mon, 23 Jan 2023 at 10:52, Amit Kapila 
> > > wrote:
> > > > >
> > > > > IIRC, this is done to prevent concurrent drops of origin drop say by
> > > > > exposed API pg_replication_origin_drop(). See the discussion in [1]
> > > > > related to it. If we want we can optimize it so that we can acquire
> > > > > the lock on the specific origin as mentioned in comments
> > > > > replorigin_drop_by_name() but it was not clear that this operation
> > > > > would be frequent enough.
> > > >
> > > > Here is an attached patch to lock the replication origin record using
> > > > LockSharedObject instead of locking pg_replication_origin relation in
> > > > ExclusiveLock mode. Now tablesync worker will wait only if the
> > > > tablesync worker is trying to drop the same replication origin which
> > > > has already been dropped by the apply worker, the other tablesync
> > > > workers will be able to successfully drop the replication origin
> > > > without any wait.
> > > >
> > >
> > > There is a code in the function replorigin_drop_guts() that uses the
> > > functionality introduced by replorigin_exists(). Can we reuse this 
> > > function for
> > > the same?
> >
> > Maybe we can use SearchSysCacheExists1 to check the existence instead of
> > adding a new function.
> >
>
> Yeah, I think that would be better.
>
> > One comment about the patch.
> >
> > @@ -430,23 +445,21 @@ replorigin_drop_by_name(const char *name, bool 
> > missing_ok, bool nowait)
> > ...
> > +   /* Drop the replication origin if it has not been dropped already */
> > +   if (replorigin_exists(roident))
> > replorigin_drop_guts(rel, roident, nowait);
> >
> > If developer pass missing_ok as false, should we report an ERROR here
> > instead of silently return ?
> >
>
> One thing that looks a bit odd is that we will anyway have a similar
> check in replorigin_drop_guts() which is a static function and called
> from only one place, so, will it be required to check at both places?

There is a possibility that the initial check to verify if replication
origin exists in replorigin_drop_by_name was successful but later one
of either table sync worker or apply worker process might have dropped
the replication origin, so it is better to check again before calling
replorigin_drop_guts, ideally the tuple should be valid in
replorigin_drop_guts, but can we keep the check as it is to maintain
the consistency before calling CatalogTupleDelete.

Regards,
Vignesh




RE: Logical replication timeout problem

2023-01-29 Thread shiy.f...@fujitsu.com
On Sun, Jan 29, 2023 3:41 PM wangw.f...@fujitsu.com  
wrote:
> 
> I tested a mix transaction of transactional and non-transactional messages on
> the current HEAD and reproduced the timeout problem. I think this result is 
> OK.
> Because when decoding a transaction, non-transactional changes are processed
> directly and the function WalSndKeepaliveIfNecessary is called, while
> transactional changes are cached and processed after decoding. After decoding,
> only transactional changes will be processed (in the function
> ReorderBufferProcessTXN), so the timeout problem will still be reproduced.
> 
> After applying the v8 patch, the test mentioned above didn't reproduce the
> timeout problem (Attach this test script 'test_with_nontransactional.sh').
> 
> Attach the new patch.
> 

Thanks for updating the patch. Here is a comment.

In update_progress_txn_cb_wrapper(), it looks like we need to reset
changes_count to 0 after calling OutputPluginUpdateProgress(), otherwise
OutputPluginUpdateProgress() will always be called after 100 changes.

Regards,
Shi yu



Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-29 Thread Amit Kapila
On Mon, Jan 30, 2023 at 8:32 AM Kyotaro Horiguchi
 wrote:
>
> At Sat, 28 Jan 2023 04:28:29 +, "Takamichi Osumi (Fujitsu)" 
>  wrote in
> > On Friday, January 27, 2023 8:00 PM Amit Kapila  
> > wrote:
> > > So, you have changed min_apply_delay from int64 to int32, but you haven't
> > > mentioned the reason for the same? We use 'int' for the similar parameter
> > > recovery_min_apply_delay, so, ideally, it makes sense but still better to 
> > > tell your
> > > reason explicitly.
> > Yes. It's because I thought I need to make this feature consistent with the 
> > recovery_min_apply_delay.
> > This feature handles the range same as the recovery_min_apply delay from 0 
> > to INT_MAX now
> > so should be adjusted to match it.
>
> INT_MAX can stick out of int32 on some platforms. (I'm not sure where
> that actually happens, though.) We can use PG_INT32_MAX instead.
>

But in other integer GUCs including recovery_min_apply_delay, we use
INT_MAX, so not sure if it is a good idea to do something different
here.

-- 
With Regards,
Amit Kapila.




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-29 Thread Kyotaro Horiguchi
At Sat, 28 Jan 2023 04:28:29 +, "Takamichi Osumi (Fujitsu)" 
 wrote in 
> On Friday, January 27, 2023 8:00 PM Amit Kapila  
> wrote:
> > So, you have changed min_apply_delay from int64 to int32, but you haven't
> > mentioned the reason for the same? We use 'int' for the similar parameter
> > recovery_min_apply_delay, so, ideally, it makes sense but still better to 
> > tell your
> > reason explicitly.
> Yes. It's because I thought I need to make this feature consistent with the 
> recovery_min_apply_delay.
> This feature handles the range same as the recovery_min_apply delay from 0 to 
> INT_MAX now
> so should be adjusted to match it.

INT_MAX can stick out of int32 on some platforms. (I'm not sure where
that actually happens, though.) We can use PG_INT32_MAX instead.

IMHO, I think we don't use int as a catalog column and I agree that
int32 is sufficient since I don't think more than 49 days delay is
practical. On the other hand, maybe I wouldn't want to use int32 for
intermediate calculations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Prefetch the next tuple's memory during seqscans

2023-01-29 Thread David Rowley
On Wed, 4 Jan 2023 at 23:06, vignesh C  wrote:
> patching file src/backend/access/heap/heapam.c
> Hunk #1 FAILED at 451.
> 1 out of 6 hunks FAILED -- saving rejects to file
> src/backend/access/heap/heapam.c.rej

I've moved this patch to the next CF.  This patch has a dependency on
what's being proposed in [1]. I'd rather wait until that goes in
before rebasing this. Having this go in first will just make Melanie's
job harder on her heapam.c refactoring work.

David

[1] https://commitfest.postgresql.org/41/3987/




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Thomas Munro
On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro  wrote:
> out-of-order hazard

I've been trying to understand how that could happen, but my CPU-fu is
weak.  Let me try to write an argument for why it can't happen, so
that later I can look back at how stupid and naive I was.  We have A
B, and if the CPU sees no dependency and decides to execute B A
(pipelined), shouldn't an interrupt either wait for the whole
schemozzle to commit first (if not in a hurry), or nuke it, handle the
IPI and restart, or something?  After an hour of reviewing random
slides from classes on out-of-order execution and reorder buffers and
the like, I think the term for making sure that interrupts run with
the illusion of in-order execution maintained is called "precise
interrupts", and it is expected in all modern architectures, after the
early OoO pioneers lost their minds trying to program without it.  I
guess generally you want that because it would otherwise run your
interrupt handler in a completely uncertain environment, and
specifically in this case it would reach our signal handler which
reads A's output (waiting) and writes to B's input (is_set), so B IPI
A surely shouldn't be allowed?

As for compiler barriers, I see that elver's compiler isn't reordering the code.

Maybe it's a much dumber sort of a concurrency problem: stale cache
line due to missing barrier, but... commit db0f6cad488 made us also
set our own latch (a second time) when someone sets our latch in
releases 9.something to 13.  Which should mean that we're guaranteed
to see is_set = true in the scenario described, because we'll clobber
it ourselves if we have to, for good measure.

If our secondary SetLatch() sees it's already set and decides not to
set it, then it's possible that the code we interrupted was about to
run ResetLatch(), but any code doing that must next check its expected
exit condition (or it has a common-or-garden latch protocol bug, as
has been discovered from time in the tree...).

/me wanders away with a renewed fear of computers and the vast
complexities they hide




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-29 Thread Peter Smith
Patch v88-0001 LGTM.

Below are just some minor review comments about the commit message.

==
Commit message

1.
We have discussed having this parameter as a subscription option but
exposing a parameter that is primarily used for testing/debugging to users
didn't seem advisable and there is no other such parameter. The other
option we have discussed is to have a separate GUC for subscriber-side
testing but it appears that for the current testing existing parameter is
sufficient and avoids adding another GUC.

SUGGESTION
We discussed exposing this parameter as a subscription option, but it
did not seem advisable since it is primarily used for
testing/debugging and there is no other such developer option.

We also discussed having separate GUCs for publisher/subscriber-side,
but for current testing/debugging requirements, one GUC is sufficient.

~~

2.
Reviewed-by: Pater Smith, Kuroda Hayato, Amit Kapila

"Pater" --> "Peter"

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: pub/sub - specifying optional parameters without values.

2023-01-29 Thread Tom Lane
Zheng Li  writes:
> The behavior is due to the following code
> https://github.com/postgres/postgres/blob/master/src/backend/commands/define.c#L113

Yeah, so you can grep for places that have this behavior by looking
for defGetBoolean calls ... and there are quite a few.  That leads
me to the conclusion that we'd better invent a fairly stylized
documentation solution that we can plug into a lot of places,
rather than thinking of slightly different ways to say it and
places to say it.  I'm not necessarily opposed to Peter's desire
to fix replication-related commands first, but we have more to do
later.

I'm also not that thrilled with putting the addition up at the top
of the relevant text.  This behavior is at least two decades old,
so if we've escaped documenting it at all up to now, it can't be
that important to most people.

I also notice that ALTER SUBSCRIPTION has fully three different
sub-sections with about equal claims on this note, if we're going
to stick it directly into the affected option lists.

That all leads me to propose that we add the new text at the end of
the Parameters  in the affected man pages.  So about
like the attached.  (I left out alter_publication.sgml, as I'm not
sure it needs its own copy of this text --- it doesn't describe
individual parameters at all, just refer to CREATE PUBLICATION.)

regards, tom lane

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index ad93553a1d..964fcbb8ff 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -277,6 +277,13 @@ ALTER SUBSCRIPTION name RENAME TO <
 

   
+
+  
+   When specifying a parameter of type boolean, the
+   = value
+   part can be omitted, which is equivalent to
+   specifying TRUE.
+  
  
 
  
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index e229384e6f..370dac2ccf 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -217,6 +217,13 @@ CREATE PUBLICATION name

 
   
+
+  
+   When specifying a parameter of type boolean, the
+   = value
+   part can be omitted, which is equivalent to
+   specifying TRUE.
+  
  
 
  
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index eba72c6af6..51c45f17c7 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -354,6 +354,13 @@ CREATE SUBSCRIPTION subscription_name

   
+
+  
+   When specifying a parameter of type boolean, the
+   = value
+   part can be omitted, which is equivalent to
+   specifying TRUE.
+  
  
 
  


Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Thomas Munro
On Mon, Jan 30, 2023 at 7:08 AM Tomas Vondra
 wrote:
> However, the other lockup I saw was when using serial_schedule, so I
> guess lower concurrency makes it more likely.

FWIW "psql db -f src/test/regress/sql/join_hash.sql | cat" also works
(I mean, it's self-contained and doesn't need anything else from make
check; pipe to cat just disables the pager); that's how I've been
trying (and failing) to reproduce this on various computers.  I also
did a lot of "make -C src/test/module/test_shm_mq installcheck" loops,
at the same time, because that's where my animal hung.




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Tomas Vondra



On 1/29/23 18:53, Andres Freund wrote:
> Hi,
> 
> On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote:
>> Will do, but I'll wait for another lockup to see how frequent it
>> actually is. I'm now at ~90 runs total, and it didn't happen again yet.
>> So hitting it after 15 runs might have been a bit of a luck.
> 
> Was there a difference in how much load there was on the machine between
> "reproduced in 15 runs" and "not reproed in 90"?  If indeed lack of barriers
> is related to the issue, an increase in context switches could substantially
> change the behaviour (in both directions).  More intra-process context
> switches can amount to "probabilistic barriers" because that'll be a
> barrier. At the same time it can make it more likely that the relatively
> narrow window in WaitEventSetWait() is hit, or lead to larger delays
> processing signals.
> 

No. The only thing the machine is doing is

  while /usr/bin/true; do
make check
  done

I can't reduce the workload further, because the "join" test is in a
separate parallel group (I cut down parallel_schedule). I could make the
machine busier, of course.

However, the other lockup I saw was when using serial_schedule, so I
guess lower concurrency makes it more likely.

But who knows ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-01-29 Thread Tom Lane
Justin Pryzby  writes:
> SELECT pg_show_all_settings() ought to keep working when called with no
> parameter.  Tom gave me a hint how to do that for system catalogs here:
> https://www.postgresql.org/message-id/17988.1584472...@sss.pgh.pa.us
> In this case, it might be cleaner to add a second entry to pg_proc.dat
> than to add "CREATE OR REPLACE FUNCTION" to system_functions.sql (I
> tried but couldn't get that to work just now).

I kind of think this is a lot of unnecessary work.  The case that is
problematic is a GUC that's marked GUC_NO_SHOW_ALL but not marked
GUC_NOT_IN_SAMPLE.  There aren't any of those, and I don't think there
are likely to be any in future, because it doesn't make a lot of sense.
Why don't we just make a policy against doing that, and enforce it
with an assertion somewhere in GUC initialization?

regards, tom lane




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Andres Freund
Hi,

On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote:
> Will do, but I'll wait for another lockup to see how frequent it
> actually is. I'm now at ~90 runs total, and it didn't happen again yet.
> So hitting it after 15 runs might have been a bit of a luck.

Was there a difference in how much load there was on the machine between
"reproduced in 15 runs" and "not reproed in 90"?  If indeed lack of barriers
is related to the issue, an increase in context switches could substantially
change the behaviour (in both directions).  More intra-process context
switches can amount to "probabilistic barriers" because that'll be a
barrier. At the same time it can make it more likely that the relatively
narrow window in WaitEventSetWait() is hit, or lead to larger delays
processing signals.

Greetings,

Andres Freund




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Andres Freund
Hi,

On 2023-01-30 06:26:02 +1300, Thomas Munro wrote:
> On Mon, Jan 30, 2023 at 1:53 AM Tomas Vondra
>  wrote:
> > So I did that - same configure options as the buildfarm client, and a
> > 'make check' (with only tests up to the 'join' suite, because that's
> > where it got stuck before). And it took only ~15 runs (~1h) to hit this
> > again on dikkop.
> 
> That's good news.

Indeed.

As annoying as it is, it might be worth reproing it once or twice more, just
to have a feeling for how long we need to run to have confidence in a fix.


> I was talking to Andres on IM about this yesterday and he pointed out
> a potential out-of-order hazard: WaitEventSetWait() sets "waiting" (to
> tell the signal handler to write to the self-pipe) and then reads
> latch->is_set with neither compiler nor memory barrier, which doesn't
> seem right because we might see a value of latch->is_set from before
> "waiting" was true, and yet the signal handler might also have run
> while "waiting" was false so the self-pipe doesn't save us, despite
> the length of the comment about that.  Can you reproduce it with this
> change?
> 
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -1011,6 +1011,7 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
>  * ordering, so that we cannot miss seeing is_set if a 
> notificat
> ion
>  * has already been queued.
>  */
> +   pg_memory_barrier();
> if (set->latch && set->latch->is_set)
> {
> occurred_events->fd = PGINVALID_SOCKET;

I think we need a barrier in SetLatch(), after is_set = true. We have that in
some of the newer branches (due to the maybe_sleeping logic), but not in the
older branches.

Greetings,

Andres Freund




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Tomas Vondra



On 1/29/23 18:26, Thomas Munro wrote:
> On Mon, Jan 30, 2023 at 1:53 AM Tomas Vondra
>  wrote:
>> So I did that - same configure options as the buildfarm client, and a
>> 'make check' (with only tests up to the 'join' suite, because that's
>> where it got stuck before). And it took only ~15 runs (~1h) to hit this
>> again on dikkop.
> 
> That's good news.
> 
>> I managed to collect the fstat/procstat stuff Thomas asked for, and the
>> backtraces - attached. I still have the core files, in case we look at
>> something. As before, running gcore on the second worker (29081) gets
>> this unstuck - it sends some signal that apparently wakes it up.
> 
> Thanks!  As expected, no bytes in the pipe for any those processes.
> Unfortunately I gave the wrong procstat command, it should be -i, not
> -j.  Does "procstat -i /path/to/core | grep USR1" show P (pending) for
> that stuck process?  Silly question really, I don't really expect
> poll() to be misbehaving in such a basic way.
> 

It shows "--C" for all three processes, which should mean "will be caught".

> I was talking to Andres on IM about this yesterday and he pointed out
> a potential out-of-order hazard: WaitEventSetWait() sets "waiting" (to
> tell the signal handler to write to the self-pipe) and then reads
> latch->is_set with neither compiler nor memory barrier, which doesn't
> seem right because we might see a value of latch->is_set from before
> "waiting" was true, and yet the signal handler might also have run
> while "waiting" was false so the self-pipe doesn't save us, despite
> the length of the comment about that.  Can you reproduce it with this
> change?
> 

Will do, but I'll wait for another lockup to see how frequent it
actually is. I'm now at ~90 runs total, and it didn't happen again yet.
So hitting it after 15 runs might have been a bit of a luck.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Thomas Munro
On Mon, Jan 30, 2023 at 1:53 AM Tomas Vondra
 wrote:
> So I did that - same configure options as the buildfarm client, and a
> 'make check' (with only tests up to the 'join' suite, because that's
> where it got stuck before). And it took only ~15 runs (~1h) to hit this
> again on dikkop.

That's good news.

> I managed to collect the fstat/procstat stuff Thomas asked for, and the
> backtraces - attached. I still have the core files, in case we look at
> something. As before, running gcore on the second worker (29081) gets
> this unstuck - it sends some signal that apparently wakes it up.

Thanks!  As expected, no bytes in the pipe for any those processes.
Unfortunately I gave the wrong procstat command, it should be -i, not
-j.  Does "procstat -i /path/to/core | grep USR1" show P (pending) for
that stuck process?  Silly question really, I don't really expect
poll() to be misbehaving in such a basic way.

I was talking to Andres on IM about this yesterday and he pointed out
a potential out-of-order hazard: WaitEventSetWait() sets "waiting" (to
tell the signal handler to write to the self-pipe) and then reads
latch->is_set with neither compiler nor memory barrier, which doesn't
seem right because we might see a value of latch->is_set from before
"waiting" was true, and yet the signal handler might also have run
while "waiting" was false so the self-pipe doesn't save us, despite
the length of the comment about that.  Can you reproduce it with this
change?

--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1011,6 +1011,7 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 * ordering, so that we cannot miss seeing is_set if a notificat
ion
 * has already been queued.
 */
+   pg_memory_barrier();
if (set->latch && set->latch->is_set)
{
occurred_events->fd = PGINVALID_SOCKET;




Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-01-29 Thread Justin Pryzby
On Sun, Jan 29, 2023 at 05:22:13PM +0530, Nitin Jadhav wrote:
> > We could extend pg_show_all_settings() with a boolean parameter to
> > enforce listing all the parameters, even the ones that are marked as
> > NOSHOW, but this does not count on GetConfigOptionValues() that could
> > force a parameter to become noshow on a superuser-only GUC depending
> > on the role that's running the function.  At the end, we'd better rely
> > on a separate superuser-only function to do this job, aka option 1.
> 
> I had started a separate thread [1] to refactor the code around
> GetConfigOptionValues() and the patch is already committed. Now it
> makes our job simpler to extend pg_show_all_settings() with a boolean
> parameter to enforce listing all the parameters, even the ones that
> are marked as NOSHOW. I have attached the patch for the same. Kindly
> look into it and share your thoughts.

SELECT pg_show_all_settings() ought to keep working when called with no
parameter.  Tom gave me a hint how to do that for system catalogs here:
https://www.postgresql.org/message-id/17988.1584472...@sss.pgh.pa.us

In this case, it might be cleaner to add a second entry to pg_proc.dat
than to add "CREATE OR REPLACE FUNCTION" to system_functions.sql (I
tried but couldn't get that to work just now).

-- 
Justin




Re: [Commitfest 2023-01] has started

2023-01-29 Thread vignesh C
On Sun, 22 Jan 2023 at 22:00, vignesh C  wrote:
>
> On Sun, 15 Jan 2023 at 23:02, vignesh C  wrote:
> >
> > On Sun, 8 Jan 2023 at 21:00, vignesh C  wrote:
> > >
> > > On Tue, 3 Jan 2023 at 13:13, vignesh C  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Just a reminder that Commitfest 2023-01 has started.
> > > > There are many patches based on the latest run from [1] which require
> > > > a) Rebased on top of head b) Fix compilation failures c) Fix test
> > > > failure, please have a look and rebase it so that it is easy for the
> > > > reviewers and committers:
> > > > 1. TAP output format for pg_regress
> > > > 2. Add BufFileRead variants with short read and EOF detection
> > > > 3. Add SHELL_EXIT_CODE variable to psql
> > > > 4. Add foreign-server health checks infrastructure
> > > > 5. Add last_vacuum_index_scans in pg_stat_all_tables
> > > > 6. Add index scan progress to pg_stat_progress_vacuum
> > > > 7. Add the ability to limit the amount of memory that can be allocated
> > > > to backends.
> > > > 8. Add tracking of backend memory allocated to pg_stat_activity
> > > > 9. CAST( ... ON DEFAULT)
> > > > 10. CF App: add "Returned: Needs more interest" close status
> > > > 11. CI and test improvements
> > > > 12. Cygwin cleanup
> > > > 13. Expand character set for ltree labels
> > > > 14. Fix tab completion MERGE
> > > > 15. Force streaming every change in logical decoding
> > > > 16. More scalable multixacts buffers and locking
> > > > 17. Move SLRU data into the regular buffer pool
> > > > 18. Move extraUpdatedCols out of RangeTblEntry
> > > > 19.New [relation] options engine
> > > > 20. Optimizing Node Files Support
> > > > 21. PGDOCS - Stats views and functions not in order?
> > > > 22. POC: Lock updated tuples in tuple_update() and tuple_delete()
> > > > 23. Parallelize correlated subqueries that execute within each worker
> > > > 24. Pluggable toaster
> > > > 25. Prefetch the next tuple's memory during seqscans
> > > > 26. Pulling up direct-correlated ANY_SUBLINK
> > > > 27. Push aggregation down to base relations and joins
> > > > 28. Reduce timing overhead of EXPLAIN ANALYZE using rdtsc
> > > > 29. Refactor relation extension, faster COPY
> > > > 30. Remove NEW placeholder entry from stored view query range table
> > > > 31. TDE key management patches
> > > > 32. Use AF_UNIX for tests on Windows (ie drop fallback TCP code)
> > > > 33. Windows filesystem support improvements
> > > > 34. making relfilenodes 56 bit
> > > > 35. postgres_fdw: commit remote (sub)transactions in parallel during 
> > > > pre-commit
> > > > 36.recovery modules
> > > >
> > > > Commitfest status as of now:
> > > > Needs review:177
> > > > Waiting on Author:   47
> > > > Ready for Committer:  20
> > > > Committed:  31
> > > > Withdrawn:4
> > > > Rejected:   0
> > > > Returned with Feedback:  0
> > > > Total:  279
> > > >
> > > > We will be needing more members to actively review the patches to get
> > > > more patches to the committed state. I would like to remind you that
> > > > each patch submitter is expected to review at least one patch from
> > > > another submitter during the CommitFest, those members who have not
> > > > picked up patch for review please pick someone else's patch to review
> > > > as soon as you can.
> > > > I'll send out reminders this week to get your patches rebased and
> > > > update the status of the patch accordingly.
> > > >
> > > > [1] - http://cfbot.cputube.org/
> > >
> > > Hi Hackers,
> > >
> > > Here's a quick status report after the first week (I think only about
> > > 9 commits happened during the week, the rest were pre-CF activity):
> > >
> > > status   |   3rd Jan |  w1
> > > -+---+-
> > > Needs review:|   177 | 149
> > > Waiting on Author:   |47 |  60
> > > Ready for Committer: |20 |  23
> > > Committed:   |31 |  40
> > > Withdrawn:   | 4 |   7
> > > Rejected:| 0 |   0
> > > Returned with Feedback:  | 0 |   0
> > > Total:   |   279 | 279
> > >
> > > Here is a list of "Needs review" entries for which there has not been
> > > much communication on the thread and needs help in proceeding further.
> > > Please pick one of these and help us on how to proceed further:
> > > pgbench: using prepared BEGIN statement in a pipeline could cause an
> > > error | Yugo Nagata
> > > Fix dsa_free() to re-bin segment | Dongming Liu
> > > pg_rewind: warn when checkpoint hasn't happened after promotion | James 
> > > Coleman
> > > Work around non-atomic read of read of control file on ext4 | Thomas Munro
> > > Rethinking the implementation of ts_headline | Tom Lane
> > > Fix GetWALAvailability function code comments for WALAVAIL_REMOVED
> > > return value | sirisha chamarti

Re: Bug #17759: GENERATED columns not computed during MERGE

2023-01-29 Thread Tom Lane
Dean Rasheed  writes:
> I spent a little time investigating bug #17759 [1] in more detail.
> Initially, I thought that it had been fixed by 3f7836ff65, but it
> turns out that's not the case.

Thanks for looking closer!  I had felt a little unsure about that
too, but hadn't gotten to poking into it.

> So I think we need the attached in HEAD and v15.

Looks good to me.

regards, tom lane




Re: Assertion failure in SnapBuildInitialSnapshot()

2023-01-29 Thread Masahiko Sawada
On Sat, Jan 28, 2023 at 11:54 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Amit, Sawada-san,
>
> I have also reproduced the failure on PG15 with some debug log, and I agreed 
> that
> somebody changed procArray->replication_slot_xmin to InvalidTransactionId.
>
> > > The same assertion failure has been reported on another thread[1].
> > > Since I could reproduce this issue several times in my environment
> > > I've investigated the root cause.
> > >
> > > I think there is a race condition of updating
> > > procArray->replication_slot_xmin by CreateInitDecodingContext() and
> > > LogicalConfirmReceivedLocation().
> > >
> > > What I observed in the test was that a walsender process called:
> > > SnapBuildProcessRunningXacts()
> > >   LogicalIncreaseXminForSlot()
> > > LogicalConfirmReceivedLocation()
> > >   ReplicationSlotsComputeRequiredXmin(false).
> > >
> > > In ReplicationSlotsComputeRequiredXmin() it acquired the
> > > ReplicationSlotControlLock and got 0 as the minimum xmin since there
> > > was no wal sender having effective_xmin.
> > >
> >
> > What about the current walsender process which is processing
> > running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
> > slot's effective_xmin have a non-zero value? If not, then why?
>
> Normal walsenders which are not for tablesync create a replication slot with
> NOEXPORT_SNAPSHOT option. I think in this case, CreateInitDecodingContext() is
> called with need_full_snapshot = false, and slot->effective_xmin is not 
> updated.

Right. This is how we create a slot used by an apply worker.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-29 Thread Masahiko Sawada
On Sat, Jan 28, 2023 at 8:33 PM John Naylor
 wrote:
>
> On Thu, Jan 26, 2023 at 3:33 PM Masahiko Sawada  wrote:
> >
> > On Thu, Jan 26, 2023 at 3:54 PM John Naylor
> >  wrote:
>
> > I think that we need to prevent concurrent updates (RT_SET() and
> > RT_DELETE()) during the iteration to get the consistent result through
> > the whole iteration operation. Unlike other operations such as
> > RT_SET(), we cannot expect that a job doing something for each
> > key-value pair in the radix tree completes in a short time, so we
> > cannot keep holding the radix tree lock until the end of the
> > iteration.
>
> This sounds like a performance concern, rather than a correctness concern, is 
> that right? If so, I don't think we should worry too much about optimizing 
> simple locking, because it will *never* be fast enough for highly-concurrent 
> read-write workloads anyway, and anyone interested in those workloads will 
> have to completely replace the locking scheme, possibly using one of the 
> ideas in the last ART paper you mentioned.
>
> The first implementation should be simple, easy to test/verify, easy to 
> understand, and easy to replace. As much as possible anyway.

Yes, but if a concurrent writer waits for another process to finish
the iteration, it ends up waiting on a lwlock, which is not
interruptible.

>
> > So the idea is that we set iter_active to true (with the
> > lock in exclusive mode), and prevent concurrent updates when the flag
> > is true.
>
> ...by throwing elog(ERROR)? I'm not so sure users of this API would prefer 
> that to waiting.

Right. I think if we want to wait rather than an ERROR, the waiter
should wait in an interruptible way, for example, a condition
variable. I did a simpler way in the v22 patch.

...but looking at dshash.c, dshash_seq_next() seems to return an entry
while holding a lwlock on the partition. My assumption might be wrong.

>
> > > Since there were calls to LWLockAcquire/Release in the last version, I'm 
> > > a bit confused by this. Perhaps for the next patch, the email should 
> > > contain a few sentences describing how locking is intended to work, 
> > > including for iteration.
> >
> > The lock I'm thinking of adding is a simple readers-writer lock. This
> > lock is used for concurrent radix tree operations except for the
> > iteration. For operations concurrent to the iteration, I used a flag
> > for the reason I mentioned above.
>
> This doesn't tell me anything -- we already agreed on "simple reader-writer 
> lock", months ago I believe. And I only have a vague idea about the tradeoffs 
> made regarding iteration.
>
> + * WIP: describe about how locking works.
>
> A first draft of what is intended for this WIP would be a good start. This 
> WIP is from v23-0016, which contains no comments and a one-line commit 
> message. I'd rather not try closely studying that patch (or how it works with 
> 0011) until I have a clearer understanding of what requirements are assumed, 
> what trade-offs are considered, and how it should be tested.
>
> [thinks some more...] Is there an API-level assumption that hasn't been 
> spelled out? Would it help to have a parameter for whether the iteration 
> function wants to reserve the privilege to perform writes? It could take the 
> appropriate lock at the start, and there could then be multiple read-only 
> iterators, but only one read/write iterator. Note, I'm just guessing here, 
> and I don't want to make things more difficult for future improvements.

Seems a good idea. Given the use case for parallel heap vacuum, it
would be a good idea to support having multiple read-only writers. The
iteration of the v22 is read-only, so if we want to support read-write
iterator, we would need to support a function that modifies the
current key-value returned by the iteration.

>
> > > Hmm, I wonder if we need to use the isolation tester. It's both a 
> > > blessing and a curse that the first client of this data structure is tid 
> > > lookup. It's a blessing because it doesn't present a highly-concurrent 
> > > workload mixing reads and writes and so simple locking is adequate. It's 
> > > a curse because to test locking and have any chance of finding bugs, we 
> > > can't rely on vacuum to tell us that because (as you've said) it might 
> > > very well work fine with no locking at all. So we must come up with test 
> > > cases ourselves.
> >
> > Using the isolation tester to test locking seems like a good idea. We
> > can include it in test_radixtree. But given that the locking in the
> > radix tree is very simple, the test case would be very simple. It may
> > be controversial whether it's worth adding such testing by adding both
> > the new test module and test cases.
>
> I mean that the isolation tester (or something else) would contain test 
> cases. I didn't mean to imply redundant testing.

Okay, understood.

>
> > I think the user (e.g, vacuumlazy.c) can pass the maximum offset
> > number to the parallel vacuum.
>
> Okay, 

Re: pg_stat_statements and "IN" conditions

2023-01-29 Thread Dmitry Dolgov
> On Sun, Jan 29, 2023 at 09:56:02AM -0300, Marcos Pegoraro wrote:
> Em dom., 29 de jan. de 2023 às 09:24, Dmitry Dolgov <9erthali...@gmail.com>
> escreveu:
>
> > > On Fri, Jan 27, 2023 at 08:15:29PM +0530, vignesh C wrote:
> > > The patch does not apply on top of HEAD as in [1], please post a rebased
> > patch:
> >
> > Thanks. I think this one should do the trick.
> >
>
> There is a typo on DOC part
> +and it's length is larger than  const_merge_threshold
> ,
> +then array elements will contribure nothing to the query
> identifier.
> +Thus the query will get the same identifier no matter how many
> constants
>
> That "contribure" should be "contribute"

Indeed, thanks for noticing.
>From 1d980ef5f556c1684ea5c991965b2375bbdd139b Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sun, 24 Jul 2022 11:43:25 +0200
Subject: [PATCH v11] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on number of parameters, because every element of ArrayExpr is
jumbled. In certain situations it's undesirable, especially if the list
becomes too large.

Make Const expressions contribute nothing to the jumble hash if they're
a part of an ArrayExpr, which length is larger than specified threshold.
Allow to configure the threshold via the new GUC const_merge_threshold
with the default value zero, which disables this feature.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane
Tested-by: Chengxi Sun
---
 .../expected/pg_stat_statements.out   | 412 ++
 .../pg_stat_statements/pg_stat_statements.c   |  33 +-
 .../sql/pg_stat_statements.sql| 107 +
 doc/src/sgml/config.sgml  |  26 ++
 doc/src/sgml/pgstatstatements.sgml|  28 +-
 src/backend/nodes/queryjumblefuncs.c  | 105 -
 src/backend/utils/misc/guc_tables.c   |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/nodes/queryjumble.h   |   5 +-
 9 files changed, 712 insertions(+), 19 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 9ac5c87c3a..f18f34ae5b 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1141,4 +1141,416 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
  2
 (1 row)
 
+--
+-- Consts merging
+--
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- No merging
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) | 1
+ SELECT pg_stat_statements_reset()  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(7 rows)
+
+-- Normal
+SET const_merge_threshold = 5;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3)  | 1
+ SELECT pg_stat_statements_reset()  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6);
+ id | 

Re: [PATCH] Fix old thinko in formula to compute sweight in numeric_sqrt().

2023-01-29 Thread Dean Rasheed
On Sat, 28 Jan 2023 at 22:14, Joel Jacobson  wrote:
>
> I found what appears to be a small harmless error in numeric.c,
> that seems worthwhile to fix only because it's currently causes confusion.
>

Shrug. Looking at git blame, it's been like that for about 20 years,
and I wasn't aware of it causing confusion.

> HEAD, patched:
> sweight = (arg.weight * DEC_DIGITS) / 2 + 1
>

You haven't actually said why this formula is more correct than the
current one. I believe that it is when arg.weight >= 0, but I'm not
convinced it's correct for arg.weight < 0.

Given that this is only an approximate computation, which ensures
roughly 16 significant digits in the result, but can't guarantee
exactly 16 digits, I'm not convinced of the benefits of changing it.

> Note, however, that it's still possible to find examples of when sqrt(numeric)
> produce results with different precision for different DEC_DIGITS/NBASE 
> values,
> but in such cases, it's intentional, and due to getting additional precision
> for free, since the larger the NBASE, the more decimal digits are produced
> at the same time per iteration in the calculation.
>
> Example:
>
> HEAD, unpatched
> DEC_DIGITS  sqrt(102::numeric)
> 4   10.09950493836208
> 2   10.099504938362078
> 1   10.0995049383620780
>
> HEAD, patched:
> DEC_DIGITS  sqrt(102::numeric)
> 4   10.099504938362078
> 2   10.09950493836208
> 1   10.09950493836208
>
> According to the comment in numeric_sqrt(), the goal is to give at least
> NUMERIC_MIN_SIG_DIGITS (16) significant digits.
>
> Since 10.09950493836208 has 16 significant digits, we can see above how
> DEC_DIGITS==2 causes an additional unnecessary significant digit to be 
> computed,
> and for DEC_DIGITS==1, two additional unnecessary significant digits are
> computed.
>
> The patched version returns 16 significant digits as expected for 
> DEC_DIGITS==2
> and DEC_DIGITS==1, and for DEC_DIGITS==4 we get an additional digit for free.
>

You lost me here. In unpatched HEAD, sqrt(102::numeric) produces
10.099504938362078, not 10.09950493836208 (with DEC_DIGITS = 4). And
wasn't your previous point that when DEC_DIGITS = 4, the new formula
is the same as the old one?

> In conclusion, the proposed patch fixes a harmless problem, but is important
> to fix, since otherwise, anyone who want to experiment with different
> DEC_DIGITS/NBASE combinations by changing the `#if 0` preprocessor values
> in the top of numeric.c will get surprising results from sqrt().
>

Anyone changing DEC_DIGITS/NBASE will find hundreds of regression test
failures due to changes in result precision all over the place (and
failures due to the overflow limit changing). I don't see why
numeric_sqrt() should be singled out for fixing.

> In passing, also add pow10[] values for DEC_DIGITS==2 and DEC_DIGITS==1,
> since otherwise it's not possible to compile such DEC_DIGITS values
> due to the assert:
>
> StaticAssertDecl(lengthof(pow10) == DEC_DIGITS, "mismatch with 
> DEC_DIGITS");
>

That might be worth doing, to ensure that the code still compiles for
other DEC_DIGITS/NBASE values. I'm not sure how useful that really is
anymore though. As the comment at the top says, it's kept mostly for
historical reasons.

Regards,
Dean




Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-29 Thread Jim Jones


On 27.01.23 21:13, Cary Huang wrote:

I agree that it is a more elegant approach to add 
"sslcertmode=disable" on the client side to prevent sending default 
certificate.


But, if the server does request clientcert but client uses 
"sslcertmode=disable" to connect and not give a certificate, it would 
also result in authentication failure. In this case, we actually would 
want to ignore "sslcertmode=disable" and send default certificates if 
found.


Those are all very good points.

> But, if the server does request clientcert but client uses 
"sslcertmode=disable" to connect and not give a certificate, it would 
also result in authentication failure. In this case, we actually would 
want to ignore "sslcertmode=disable" and send default certificates if 
found.


I'm just wondering if this is really necessary. If the server asks for a 
certificate and the user explicitly says "I don't want to send it", 
shouldn't it be ok for the server return an authentication failure? I 
mean, wouldn't it defeat the purpose of "sslcertmode=disable"? Although 
it might be indeed quite handy I'm not sure how I feel about explicitly 
telling the client to not send a certificate and having it being sent 
anyway :)


Best, Jim



smime.p7s
Description: S/MIME Cryptographic Signature


Re: pg_stat_statements and "IN" conditions

2023-01-29 Thread Marcos Pegoraro
Em dom., 29 de jan. de 2023 às 09:24, Dmitry Dolgov <9erthali...@gmail.com>
escreveu:

> > On Fri, Jan 27, 2023 at 08:15:29PM +0530, vignesh C wrote:
> > The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
> Thanks. I think this one should do the trick.
>

There is a typo on DOC part
+and it's length is larger than  const_merge_threshold
,
+then array elements will contribure nothing to the query
identifier.
+Thus the query will get the same identifier no matter how many
constants

That "contribure" should be "contribute"

regards
Marcos


Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-01-29 Thread Tomas Vondra
On 1/28/23 13:05, Tomas Vondra wrote:
>
> FWIW I'll wait for dikkop to finish the current buildfarm run (it's
> currently chewing on HEAD) and then will try to do runs of the 'joins'
> test in a loop. That's where dikkop got stuck before.
>

So I did that - same configure options as the buildfarm client, and a
'make check' (with only tests up to the 'join' suite, because that's
where it got stuck before). And it took only ~15 runs (~1h) to hit this
again on dikkop.

As before, there are three processes - leader + 2 workers, but the query
is different - this time it's this one:

  -- A couple of other hash join tests unrelated to work_mem management.
  -- Check that EXPLAIN ANALYZE has data even if the leader doesn't
participate
  savepoint settings;
  set local max_parallel_workers_per_gather = 2;
  set local work_mem = '4MB';
  set local parallel_leader_participation = off;
  select * from hash_join_batches(
  $$
select count(*) from simple r join simple s using (id);
  $$);

I managed to collect the fstat/procstat stuff Thomas asked for, and the
backtraces - attached. I still have the core files, in case we look at
something. As before, running gcore on the second worker (29081) gets
this unstuck - it sends some signal that apparently wakes it up.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyGNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./tmp_install/usr/local/pgsql/bin/postgres...

warning: exec file is newer than core file.
[New LWP 100597]

warning: Could not load shared library symbols for /mnt/data/postgres/tmp_install/usr/local/pgsql/lib/plpgsql.so.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `postgres: freebsd regression [local] SELECT'.
#0  _poll () at _poll.S:4
4	_poll.S: No such file or directory.
(gdb) bt
#0  _poll () at _poll.S:4
#1  0x891c1850 in __thr_poll (fds=0x4, nfds=1, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338
#2  0x0076e930 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x811ed188, cur_timeout=-1, set=0x9ba2e590) at latch.c:1171
#3  WaitEventSetWait (set=set@entry=0x9ba2e590, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x811ed188, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=134217731) at latch.c:1000
#4  0x0076ee18 in WaitLatchOrSocket (latch=0x9abe49c0, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0, wait_event_info=wait_event_info@entry=134217731) at latch.c:385
#5  0x0076eef0 in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0, wait_event_info=wait_event_info@entry=134217731) at latch.c:339
#6  0x006481ec in gather_readnext (gatherstate=) at nodeGather.c:367
#7  gather_getnext (gatherstate=0x9b824818) at nodeGather.c:256
#8  ExecGather (pstate=0x9b824818) at nodeGather.c:207
#9  0x00638824 in ExecProcNodeInstr (node=0x9b824818) at execProcnode.c:462
#10 0x00641708 in ExecProcNode (node=0x9b824818) at ../../../src/include/executor/executor.h:248
#11 fetch_input_tuple (aggstate=aggstate@entry=0x9b824368) at nodeAgg.c:407
#12 0x00642ee8 in agg_retrieve_direct (aggstate=0x9b824368) at nodeAgg.c:1746
#13 ExecAgg (pstate=0x9b824368) at nodeAgg.c:1561
#14 0x00638824 in ExecProcNodeInstr (node=0x9b824368) at execProcnode.c:462
#15 0x006313a4 in ExecProcNode (node=0x9b824368) at ../../../src/include/executor/executor.h:248
#16 ExecutePlan (execute_once=, dest=0xbbac78 , direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x9b824368, 
estate=0x9b824118) at execMain.c:1712
#17 standard_ExecutorRun (queryDesc=0x9b93d550, direction=, count=0, execute_once=) at execMain.c:353
#18 0x005ca960 in ExplainOnePlan (plannedstmt=plannedstmt@entry=0x9b93d438, into=into@entry=0x0, es=es@entry=0x905b39f8, 
queryString=queryString@entry=0x905e2918 "explain (analyze, format 'json') \n  select count(*) from simple r join simple s using (id);\n", params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, 
planduration=planduration@entry=0x811ed450) at explain.c:536
#19 0x005cacac in ExplainOneQuery (query=, cursorOptions=256, into=0x0, 

Re: pg_stat_statements and "IN" conditions

2023-01-29 Thread Dmitry Dolgov
> On Fri, Jan 27, 2023 at 08:15:29PM +0530, vignesh C wrote:
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:

Thanks. I think this one should do the trick.
>From 3c51561ddaecdbc82842fae4fab74cc33526f17c Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sun, 24 Jul 2022 11:43:25 +0200
Subject: [PATCH v10] Prevent jumbling of every element in ArrayExpr

pg_stat_statements produces multiple entries for queries like

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

depending on number of parameters, because every element of ArrayExpr is
jumbled. In certain situations it's undesirable, especially if the list
becomes too large.

Make Const expressions contribute nothing to the jumble hash if they're
a part of an ArrayExpr, which length is larger than specified threshold.
Allow to configure the threshold via the new GUC const_merge_threshold
with the default value zero, which disables this feature.

Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane
Tested-by: Chengxi Sun
---
 .../expected/pg_stat_statements.out   | 412 ++
 .../pg_stat_statements/pg_stat_statements.c   |  33 +-
 .../sql/pg_stat_statements.sql| 107 +
 doc/src/sgml/config.sgml  |  26 ++
 doc/src/sgml/pgstatstatements.sgml|  28 +-
 src/backend/nodes/queryjumblefuncs.c  | 105 -
 src/backend/utils/misc/guc_tables.c   |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   2 +-
 src/include/nodes/queryjumble.h   |   5 +-
 9 files changed, 712 insertions(+), 19 deletions(-)

diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 9ac5c87c3a..f18f34ae5b 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1141,4 +1141,416 @@ SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
  2
 (1 row)
 
+--
+-- Consts merging
+--
+CREATE TABLE test_merge (id int, data int);
+-- IN queries
+-- No merging
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)  | 1
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) | 1
+ SELECT pg_stat_statements_reset()  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(7 rows)
+
+-- Normal
+SET const_merge_threshold = 5;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+ query  | calls 
++---
+ SELECT * FROM test_merge WHERE id IN ($1, $2, $3)  | 1
+ SELECT pg_stat_statements_reset()  | 1
+ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0
+(3 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
+ id | data 
++--
+(0 rows)
+
+SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+ id | data 
++--
+(0 rows)
+
+SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C";
+   

Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl

2023-01-29 Thread Nitin Jadhav
> We could extend pg_show_all_settings() with a boolean parameter to
> enforce listing all the parameters, even the ones that are marked as
> NOSHOW, but this does not count on GetConfigOptionValues() that could
> force a parameter to become noshow on a superuser-only GUC depending
> on the role that's running the function.  At the end, we'd better rely
> on a separate superuser-only function to do this job, aka option 1.

I had started a separate thread [1] to refactor the code around
GetConfigOptionValues() and the patch is already committed. Now it
makes our job simpler to extend pg_show_all_settings() with a boolean
parameter to enforce listing all the parameters, even the ones that
are marked as NOSHOW. I have attached the patch for the same. Kindly
look into it and share your thoughts.

[1]: 
https://www.postgresql.org/message-id/flat/CALj2ACXZMOGEtjk%2Beh0Zeiz%3D46ETVkr0koYAjWt8SoJUJJUe9g%40mail.gmail.com#04705e421e0dc63b1f0c862ae4929e6f

Thanks & Regards,
Nitin Jadhav

On Wed, Jan 18, 2023 at 12:31 PM Nitin Jadhav
 wrote:
>
> > We would miss the names of the parameters that are marked as NO_SHOW,
> > missing from pg_settings, making debugging harder.
> >
> > This would make the test more costly by forcing one SQL for each
> > GUC..
>
> I agree.
>
>
> > We could extend pg_show_all_settings() with a boolean parameter to
> > enforce listing all the parameters, even the ones that are marked as
> > NOSHOW, but this does not count on GetConfigOptionValues() that could
> > force a parameter to become noshow on a superuser-only GUC depending
> > on the role that's running the function.  At the end, we'd better rely
> > on a separate superuser-only function to do this job, aka option 1.
>
> I did not get it completely. To understand it better, I just gave a
> thought of adding a boolean parameter to pg_show_all_settings(). Then
> we should modify GetConfigOptionValues() like below [1]. When we call
> pg_show_all_settings(false), it behaves like existing behaviour (with
> super user and without super user). When we call
> pg_show_all_settings(true) with super user privileges, it returns all
> parameters including GUC_NO_SHOW_ALL as well as GUC_SUPER_USER_ONLY.
> If we call pg_show_all_settings(true) without super user privileges,
> then it returns all parameters except GUC_NO_SHOW_ALL and
> GUC_SUPER_USER_ONLY. Can't we do it this way? Please share your
> thoughts.
>
>
> > How much do we need to care with the duplication this would involve
> > with show_all_settings(), actually?  If you don't use the SRF macros,
> > the code would just be a couple of lines with InitMaterializedSRF()
> > doing a loop on GetConfigOptionValues().  Even if that means listing
> > twice the parameters in pg_proc.dat, the chances of adding new
> > parameters in pg_settings is rather low so that would be a one-time
> >  change?
>
> How about just fetching the parameter name instead of fetching all its
> details. This will meet our objective as well as it controls the code
> duplication.
>
> [1]:
> static void
> GetConfigOptionValues(struct config_generic *conf, const char **values,
>   bool *noshow, bool is_show_all)
> {
> charbuffer[256];
>
> if (noshow)
> {
> if (((conf->flags & GUC_NO_SHOW_ALL) && !is_show_all) ||
>   ((conf->flags & GUC_NO_SHOW_ALL) &&
>   !has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_SETTINGS)) ||
> ((conf->flags & GUC_SUPERUSER_ONLY) &&
>  !has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_SETTINGS)))
> *noshow = true;
> else
> *noshow = false;
> }
> -
> -
> -
> }
>
> On Mon, Jan 16, 2023 at 7:58 AM Michael Paquier  wrote:
> >
> > On Sat, Jan 14, 2023 at 07:10:55PM +0530, Nitin Jadhav wrote:
> > > Option-1 is, expose a function like pg_settings_get_no_show_all()
> > > which just returns the parameters which are just listed as
> > > GUC_NO_SHOW_ALL (Not in combination with NOT_IN_SAMPLE). We can then
> > > use this function in the test file and verify whether there are config
> > > entries for these.
> > >
> > > Option-2 is, if exposing new function and that too to expose
> > > parameters which are listed as GUC_NO_SHOW_ALL is not recommended,
> > > then how about exposing a function like pg_settings_get_count() which
> > > returns the count of all parameters including GUC_NO_SHOW_ALL. We can
> > > then use this number to verify whether these many are present in the
> > > sample config file. But we cannot show the name of the parameters if
> > > it is not matching. We can just display an error saying "Parameter
> > > with GUC_NO_SHOW_ALL is missing from postgresql.conf.sample".
> >
> > We would miss the names of the parameters that are marked as NO_SHOW,
> > missing from pg_settings, making debugging harder.
> >
> > > Option-3 is, if exposing both of the above functions is not
> > > recommended, then we can use the existing function
> > > pg_settings_get_flags() for 

Re: Refactoring postgres_fdw/connection.c

2023-01-29 Thread Etsuro Fujita
Hi Fujii-san,

On Thu, Sep 15, 2022 at 12:17 AM Fujii Masao
 wrote:
> On 2022/09/05 15:17, Etsuro Fujita wrote:
> > I'm not sure it's a good idea to change the function's name, because
> > that would make backpatching hard.  To avoid that, how about
> > introducing a workhorse function for pgfdw_get_result and
> > pgfdw_get_cleanup_result, based on the latter function as you
> > proposed, and modifying the two functions so that they call the
> > workhorse function?
>
> That's possible. We can revive pgfdw_get_cleanup_result() and
> make it call pgfdw_get_result_timed(). Also, with the patch,
> pgfdw_get_result() already works in that way. But I'm not sure
> how much we should be concerned about back-patch "issue"
> in this case. We usually rename the internal functions
> if new names are better.

I agree that if the name of an existing function was bad, we should
rename it, but I do not think the name pgfdw_get_cleanup_result is
bad; I think it is good in the sense that it well represents what the
function waits for.

The patch you proposed changes pgfdw_get_cleanup_result to cover the
timed_out==NULL case, but I do not think it is a good idea to rename
it for such a minor reason.  That function is used in all supported
versions, so that would just make back-patching hard.

> > @@ -1599,13 +1572,9 @@ pgfdw_finish_pre_commit_cleanup(List 
> > *pending_entries)
> >  entry = (ConnCacheEntry *) lfirst(lc);
> >
> >  /* Ignore errors (see notes in pgfdw_xact_callback) */
> > -   while ((res = PQgetResult(entry->conn)) != NULL)
> > -   {
> > -   PQclear(res);
> > -   /* Stop if the connection is lost (else we'll loop infinitely) 
> > */
> > -   if (PQstatus(entry->conn) == CONNECTION_BAD)
> > -   break;
> > -   }
> > +   pgfdw_get_result_timed(entry->conn, 0, , NULL);
> > +   PQclear(res);
> >
> > The existing code prevents interruption, but this would change to
> > allow it.  Do we need this change?
>
> You imply that we intentially avoided calling CHECK_FOR_INTERRUPT()
> there, don't you?

Yeah, this is intentional; in commit 04e706d42, I coded this to match
the behavior in the non-parallel-commit mode, which does not call
CHECK_FOR_INTERRUPT.

> But could you tell me why?

My concern about doing so is that WaitLatchOrSocket is rather
expensive, so it might lead to useless overhead in most cases.
Anyway, this changes the behavior, so you should show the evidence
that this is useful.  I think this would be beyond refactoring,
though.

> > I have to agree with Horiguchi-san, because as mentioned by him, 1)
> > there isn't enough duplicate code in the two bits to justify merging
> > them into a single function, and 2) the 0002 patch rather just makes
> > code complicated.  The current implementation is easy to understand,
> > so I'd vote for leaving them alone for now.
> >
> > (I think the introduction of pgfdw_abort_cleanup is good, because that
> > de-duplicated much code that existed both in pgfdw_xact_callback and
> > in pgfdw_subxact_callback, which would outweigh the downside of
> > pgfdw_abort_cleanup that it made code somewhat complicated due to the
> > logic to handle both the transaction and subtransaction cases within
> > that single function.  But 0002 is not the case, I think.)
>
> The function pgfdw_exec_pre_commit() that I newly introduced consists
> of two parts; issue the transaction-end command based on
> parallel_commit setting and issue DEALLOCATE ALL. The first part is
> duplicated between pgfdw_xact_callback() and pgfdw_subxact_callback(),
> but the second not (i.e., it's used only by pgfdw_xact_callback()).
> So how about getting rid of that non duplicated part from
> pgfdw_exec_pre_commit()?

Seems like a good idea.

> > I have to agree with Horiguchi-san on this as well; the existing
> > single-purpose functions are easy to understand, so I'd vote for
> > leaving them alone.
>
> Ok, I will reconsider 0003 patch. BTW, parallel abort patch that
> you're proposing seems to add new function pgfdw_finish_abort_cleanup()
> with the similar structure as the function added by 0003 patch.
> So probably it's helpful for us to consider this together :)

Ok, let us discuss this after the parallel-abort patch.

Sorry for the late reply again.

Best regards,
Etsuro Fujita




Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-01-29 Thread Mikhail Gribkov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, failed
Documentation:tested, passed

Hi Christoph,

The patch have a potential, although I have to agree with Jim Jones, it 
obviously have a problem with "alter view  set" handling.

First of all user can notice, that SET and RESET alternatives are proposed in 
an absolutely equivalent way:
postgres=# alter view VVV 
ALTER COLUMN  OWNER TO  RENAMERESET (   SET ( SET SCHEMA

But completion of a parentheses differs fore these alternatives:

postgres=# alter view VVV reset -> completes with "(" -> 
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# alter view VVV set -> completes with a single spase -> 
Display all 188 possibilities? (y or n)
(and these 188 possibilities do not contain "(")

The probmen is obviously in the newly added second line of the following clause:
COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
  "SET SCHEMA", "SET (", "RESET (");

"set schema" and "set (" alternatives are competing, while completion of the 
common part "set" leads to a string composition which does not have the 
check branch (Matches("ALTER", "VIEW", MatchAny, "SET")).

I think it may worth looking at "alter materialized view"  completion tree and 
making "alter view" the same way.

The new status of this patch is: Waiting on Author


Bug #17759: GENERATED columns not computed during MERGE

2023-01-29 Thread Dean Rasheed
I spent a little time investigating bug #17759 [1] in more detail.
Initially, I thought that it had been fixed by 3f7836ff65, but it
turns out that's not the case.

[1] 
https://www.postgresql.org/message-id/17759-e76d9bece1b5421c%40postgresql.org

The immediate cause of the bug was that, before 3f7836ff65, the set of
generated columns to be updated depended on extraUpdatedCols from the
target RTE, and for MERGE, this was not being populated. 3f7836ff65
appeared to fix that (it fixes the test case in the bug report) by no
longer relying on rte->extraUpdatedCols, but unfortunately there's a
little more to it than that.

Since 3f7836ff65, ExecInitModifyTable() calls
ExecInitStoredGenerated() if the command is an INSERT or an UPDATE,
but not if it's a MERGE. This means that the generated column info
doesn't get built until later (when a merge action actually executes
for the first time). If the first merge action to execute is an
UPDATE, and no updated columns require generated columns to be
recomputed, then ExecInitStoredGenerated() will skip those generated
columns and not generate ri_GeneratedExprs / ri_extraUpdatedCols info
for them. That's a problem, however, since the MERGE might also
contain an INSERT that gets executed later, for which it isn't safe to
skip any of the generated columns. Here's a simple reproducer:

CREATE TABLE t (
  id int PRIMARY key,
  val int,
  str text,
  upper_str text GENERATED ALWAYS AS (upper(str)) STORED
);

INSERT INTO t VALUES (1, 10, 'orig');

MERGE INTO t
  USING (VALUES (1, 100), (2, 200)) v(id, val) ON t.id = v.id
  WHEN MATCHED THEN UPDATE SET val = v.val
  WHEN NOT MATCHED THEN INSERT VALUES (v.id, v.val, 'new');

SELECT * FROM t;

 id | val | str  | upper_str
+-+--+---
  1 | 100 | orig | ORIG
  2 | 200 | new  |
(2 rows)


So we need to ensure that ExecInitModifyTable() calls
ExecInitStoredGenerated() for MERGE. Passing CMD_MERGE to
ExecInitStoredGenerated() is good enough, since anything other than
CMD_UPDATE causes it to not skip any generated columns. That could be
improved by examining the merge action list (it would be OK to skip
generated columns as long as the MERGE didn't contain an INSERT
action), but I don't think it's worth the extra effort / risk.

So I think we need the attached in HEAD and v15.

Regards,
Dean
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
new file mode 100644
index f419c47..1ac6517
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4141,12 +4141,12 @@ ExecInitModifyTable(ModifyTable *node, E
 		}
 
 		/*
-		 * For INSERT and UPDATE, prepare to evaluate any generated columns.
+		 * For INSERT/UPDATE/MERGE, prepare to evaluate any generated columns.
 		 * We must do this now, even if we never insert or update any rows,
 		 * because we have to fill resultRelInfo->ri_extraUpdatedCols for
 		 * possible use by the trigger machinery.
 		 */
-		if (operation == CMD_INSERT || operation == CMD_UPDATE)
+		if (operation == CMD_INSERT || operation == CMD_UPDATE || operation == CMD_MERGE)
 			ExecInitStoredGenerated(resultRelInfo, estate, operation);
 	}
 
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
new file mode 100644
index 9867748..11940c8
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -192,6 +192,26 @@ SELECT * FROM gtest1 ORDER BY a;
  3 | 6
 (1 row)
 
+-- test MERGE
+CREATE TABLE gtestm (
+  id int PRIMARY KEY,
+  f1 int,
+  f2 int,
+  f3 int GENERATED ALWAYS AS (f1 * 2) STORED,
+  f4 int GENERATED ALWAYS AS (f2 * 2) STORED
+);
+INSERT INTO gtestm VALUES (1, 5, 100);
+MERGE INTO gtestm t USING (VALUES (1, 10), (2, 20)) v(id, f1) ON t.id = v.id
+  WHEN MATCHED THEN UPDATE SET f1 = v.f1
+  WHEN NOT MATCHED THEN INSERT VALUES (v.id, v.f1, 200);
+SELECT * FROM gtestm ORDER BY id;
+ id | f1 | f2  | f3 | f4  
+++-++-
+  1 | 10 | 100 | 20 | 200
+  2 | 20 | 200 | 40 | 400
+(2 rows)
+
+DROP TABLE gtestm;
 -- views
 CREATE VIEW gtest1v AS SELECT * FROM gtest1;
 SELECT * FROM gtest1v;
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
new file mode 100644
index 92d373b..87ec842
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -81,6 +81,21 @@ SELECT * FROM gtest1 ORDER BY a;
 DELETE FROM gtest1 WHERE b = 2;
 SELECT * FROM gtest1 ORDER BY a;
 
+-- test MERGE
+CREATE TABLE gtestm (
+  id int PRIMARY KEY,
+  f1 int,
+  f2 int,
+  f3 int GENERATED ALWAYS AS (f1 * 2) STORED,
+  f4 int GENERATED ALWAYS AS (f2 * 2) STORED
+);
+INSERT INTO gtestm VALUES (1, 5, 100);
+MERGE INTO gtestm t USING (VALUES (1, 10), (2, 20)) v(id, f1) ON t.id = v.id
+  WHEN MATCHED THEN UPDATE SET f1 = v.f1
+  WHEN NOT MATCHED THEN INSERT VALUES (v.id, v.f1, 200);
+SELECT * FROM gtestm ORDER BY id;
+DROP TABLE gtestm;
+
 -- views
 CREATE VIEW gtest1v AS SELECT * FROM gtest1;
 SELECT * FROM