Re: BF assertion failure on mandrill in walsender, v13

2021-06-09 Thread Noah Misch
On Thu, Jun 10, 2021 at 10:47:20AM +1200, Thomas Munro wrote:
> Not sure if there is much chance of debugging this one-off failure in
> without a backtrace (long shot: any chance there's still a core
> file?)

No; it was probably in a directory deleted for each run.  One would need to
add dbx support to the buildfarm client, or perhaps add support for saving
build directories when there's a core, so I can operate dbx manually.




Re: Parallel INSERT SELECT take 2

2021-06-09 Thread Greg Nancarrow
On Thu, Jun 10, 2021 at 11:26 AM houzj.f...@fujitsu.com
 wrote:
>
> Through further review and thanks for greg-san's suggestions,
> I attached a new version patchset with some minor change in 0001,0003 and 
> 0004.
>
> 0001.
> * fix a typo in variable name.
> * add a TODO in patch comment about updating the version number when branch 
> PG15.
>
> 0003
> * fix a 'git apply white space' warning.
> * Remove some unnecessary if condition.
> * add some code comments above the safety check function.
> * Fix some typo.
>
> 0004
> * add a testcase to test ALTER PARALLEL DML UNSAFE/RESTRICTED.
>

Thanks,  those updates addressed most of what I was going to comment
on for the v9 patches.

Some additional comments on the v10 patches:

(1) I noticed some functions in the 0003 patch have no function header:

   make_safety_object
   parallel_hazard_walker
   target_rel_all_parallel_hazard_recurse

(2) I found the "recurse_partition" parameter of the
target_rel_all_parallel_hazard_recurse() function a bit confusing,
because the function recursively checks partitions without looking at
that flag. How about naming it "is_partition"?

(3) The names of the utility functions don't convey that they operate on tables.

How about:

   pg_get_parallel_safety() -> pg_get_table_parallel_safety()
   pg_get_max_parallel_hazard() -> pg_get_table_max_parallel_hazard()

or pg_get_rel_x()?

What do you think?

(4) I think that some of the tests need parallel dml settings to match
their expected output:

(i)
+-- Test INSERT with underlying query - and RETURNING (no projection)
+-- (should create a parallel plan; parallel SELECT)

-> but creates a serial plan (so needs to set parallel dml safe, so a
parallel plan is created)

(ii)
+-- Parallel INSERT with unsafe column default, should not use a parallel plan
+--
+alter table testdef parallel dml safe;

-> should set "unsafe" not "safe"

(iii)
+-- Parallel INSERT with restricted column default, should use parallel SELECT
+--
+explain (costs off) insert into testdef(a,b,d) select a,a*2,a*8 from test_data;

-> should use "alter table testdef parallel dml restricted;" before the explain

(iv)
+--
+-- Parallel INSERT with restricted and unsafe column defaults, should
not use a parallel plan
+--
+explain (costs off) insert into testdef(a,d) select a,a*8 from test_data;

-> should use "alter table testdef parallel dml unsafe;"  before the explain


Regards,
Greg Nancarrow
Fujitsu Australia




Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 9:47 PM Robert Haas  wrote:
>
> On Wed, Jun 9, 2021 at 2:43 AM Tom Lane  wrote:
> > There are specific cases where there's a good reason to worry.
> > For example, if we assume blindly that domain_in() is parallel
> > safe, we will have cause to regret that.  But I don't find that
> > to be a reason why we need to lock down everything everywhere.
> > We need to understand the tradeoffs involved in what we check,
> > and apply checks that are likely to avoid problems, while not
> > being too nanny-ish.
>
> Yeah, that's exactly how I feel about it, too.
>

Fair enough. So, I think there is a consensus to drop this patch and
if one wants then we can document these cases. Also, we don't want it
to enable parallelism for Inserts where we are trying to pursue the
approach to have a flag in pg_class which allows users to specify
whether writes are allowed on a specified relation.

-- 
With Regards,
Amit Kapila.




RE: locking [user] catalog tables vs 2pc vs logical rep

2021-06-09 Thread osumi.takami...@fujitsu.com
On Thursday, June 10, 2021 1:14 PM vignesh C 
> On Wed, Jun 9, 2021 at 12:03 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Wednesday, June 9, 2021 12:06 PM Amit Kapila
>  wrote:
> > > On Tue, Jun 8, 2021 at 6:24 PM vignesh C  wrote:
> > > >
> > > > Thanks for the updated patch.
> > > >
> > > > I have few comments:
> > > > 1) Should we list the actual system tables like
> > > > pg_class,pg_trigger, etc instead of any other catalog table?
> > > > User has issued an explicit LOCK on pg_class (or any other catalog
> > > > table)
> > > >
> > >
> > > I think the way it is mentioned is okay. We don't need to specify
> > > other catalog tables.
> > Okay.
> >
> >
> > > > 2) Here This means deadlock, after this we mention deadlock again
> > > > for each of the examples, we can remove it if redundant.
> > > > This can happen in the following ways:
> > I think this sentence works to notify that commands described below
> > are major scenarios naturally, to the readers. Then, I don't want to remove
> it.
> >
> > If you somehow feel that the descriptions are redundant, how about
> > unifying all listitems as nouns. like below ?
> >
> > * An explicit LOCK on
> > pg_class (or any other catalog table) in a
> > transaction
> > * Reordering pg_class by
> > CLUSTER command in a transaction
> > * Executing TRUNCATE on user_catalog_table
> >
> 
> This looks good to me. Keep the 2PC documentation patch also on the same
> lines.
Yeah, of course. Thanks for your confirmation.


Best Regards,
Takamichi Osumi



Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-06-09 Thread Michael Paquier
On Thu, Jun 10, 2021 at 09:17:55AM +0530, Bharath Rupireddy wrote:
> Hm. I get it. Unfortunately the commit b1ff33f is missing information
> on what the coverity tool was complaining of and it has no related
> discussion at all.

This came from a FORWARD_NULL complain, due to the fact that
parse_subscription_options() has checks for all three options if
connect is non-NULL a bit down after being done with the value
assignments with the DefElems.  So coverity was warning that we'd
better be careful to always have all three pointers set if a
connection is wanted by the caller.
--
Michael


signature.asc
Description: PGP signature


Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-09 Thread vignesh C
On Wed, Jun 9, 2021 at 12:03 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, June 9, 2021 12:06 PM Amit Kapila  
> wrote:
> > On Tue, Jun 8, 2021 at 6:24 PM vignesh C  wrote:
> > >
> > > Thanks for the updated patch.
> > >
> > > I have few comments:
> > > 1) Should we list the actual system tables like pg_class,pg_trigger,
> > > etc instead of any other catalog table?
> > > User has issued an explicit LOCK on pg_class (or any other catalog
> > > table)
> > >
> >
> > I think the way it is mentioned is okay. We don't need to specify other 
> > catalog
> > tables.
> Okay.
>
>
> > > 2) Here This means deadlock, after this we mention deadlock again for
> > > each of the examples, we can remove it if redundant.
> > > This can happen in the following ways:
> I think this sentence works to notify that commands described below
> are major scenarios naturally, to the readers. Then, I don't want to remove 
> it.
>
> If you somehow feel that the descriptions are redundant,
> how about unifying all listitems as nouns. like below ?
>
> * An explicit LOCK on pg_class 
> (or any other catalog table) in a transaction
> * Reordering pg_class by CLUSTER 
> command in a transaction
> * Executing TRUNCATE on user_catalog_table
>

This looks good to me. Keep the 2PC documentation patch also on the same lines.

Regards,
Vignesh




Re: logical replication of truncate command with trigger causes Assert

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 8:44 PM Mark Dilger  wrote:
>
> > On Jun 9, 2021, at 7:52 AM, Tom Lane  wrote:
> >
> > Here's a draft patch for that.  I decided the most sensible way to
> > organize this is to pair the existing ensure_transaction() subroutine
> > with a cleanup subroutine.  Rather unimaginatively, perhaps, I renamed
> > it to begin_transaction_step and named the cleanup end_transaction_step.
> > (Better ideas welcome.)
>
> Thanks!  The regression test I posted earlier passes with this patch applied.
>

I have also read the patch and it looks good to me.

> > Somewhat unrelated, but ... am I reading the code correctly that
> > apply_handle_stream_start and related routines are using Asserts
> > to check that the remote sent stream-control messages in the correct
> > order?
> >

Yes. I think you are talking about Assert(!in_streamed_transaction).
There is no particular reason that such Asserts are required, so we
can change to test-and-elog as you suggested later in your email.

>  That seems many degrees short of acceptable.
>
> Even if you weren't reading that correctly, this bit:
>
> xid = pq_getmsgint(s, 4);
>
> Assert(TransactionIdIsValid(xid));
>
> simply asserts that the sending server didn't send an invalid subtransaction 
> id.
>

This also needs to be changed to test-and-elog.

-- 
With Regards,
Amit Kapila.




Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-06-09 Thread Bharath Rupireddy
On Thu, Jun 10, 2021 at 8:55 AM Peter Smith  wrote:
> > > 2.
> > > + /* If connect option is supported, the others also need to be. */
> > > + Assert(!IsSet(supported_opts, SUBOPT_CONNECT) ||
> > > +(IsSet(supported_opts, SUBOPT_ENABLED) &&
> > > + IsSet(supported_opts, SUBOPT_CREATE_SLOT) &&
> > > + IsSet(supported_opts, SUBOPT_COPY_DATA)));
> > >
> > > This comment about "the others" doesn’t make sense to me.
> > >
> > > e.g. Why only these 3 options? What about all those other SUBOPT_* 
> > > options?
> >
> > It is an existing Assert and comment for ensuring somebody doesn't
> > call parse_subscription_options with SUBOPT_CONNECT, without
> > SUBOPT_ENABLED, SUBOPT_CREATE_SLOT and SUBOPT_COPY_DATA. In other
> > words, when SUBOPT_CONNECT is passed in, the other three options
> > should also be passed. " the others" there in the comment makes sense
> > just by looking at the Assert statement.
> >
>
> This misses the point of my question. And deducing the meaning of the
> "the others" from the code is completely backwards! The comment
> describes the code. The code doesn't describe the comment.
>
> Again, I was asking why “the others” are only these 3 options?. What
> about binary? What about streaming? What about refresh?
> In other words - what was the *intent* of that comment, and does the
> new code still meet the requirements of that intent? I think it does
> not.
>
> If you see github [1] when that code was first  implemented you can
> see that “the others” referred to every option other than the
> “connect”. At that time, the only others were those 3 - enabled,
> create_slot, copy_data. But now there are lots more options so
> something definitely needs to change.
> E.g.
> - Maybe the Assert now needs to include all the new options as well?
> - Maybe the entire reason for the Assert has become redundant now due
> to the introduction of SubOpts. It looks like it was not functional
> code - just something to quieten a static analysis tool.
> - Certainly “the others” is too vague and no longer has the original
> meaning anymore
>
> I don't know the answer; my guess is that all this has become obsolete
> and the whole Assert and the dodgy comment can just be deleted.

Hm. I get it. Unfortunately the commit b1ff33f is missing information
on what the coverity tool was complaining of and it has no related
discussion at all.

I agree to remove that assertion entirely. I will post a new patch set soon.

> > > 3.
> > > I feel that this patch should be split into 2 parts
> > > a) the SubOpts changes, and
> > > b) the mutually exclusive options change.
> >
> > Divided the patch into two.
> >
> > > I agree that the new SubOpts struct etc. is an improvement over existing 
> > > code.
> > >
> > > But, for the mutually exclusive options part I don't see what is
> > > gained by the new patch code. I preferred the old code with its
> > > multiple ereports. Although it was a bit repetitive IMO it was easier
> > > to read that way, and length-wise there is almost no difference. So if
> > > it is less readable and not a lot shorter then what is the benefit of
> > > the change?
> >
> > I personally don't like the repeated code when there's a chance of
> > doing it better. It might not reduce the loc, but it removes the many
> > similar ereport(ERROR calls. PSA v4-0002 patch. I think the committer
> > can take a call on it.
> >
>
> Thanks for splitting them. My votes are +1 for patch 0001 and  -1 for
> patch 0002. As you say, someone else can decide.

Let's see how it goes further.

With Regards,
Bharath Rupireddy.




Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-06-09 Thread Peter Smith
On Thu, Jun 10, 2021 at 1:28 AM Bharath Rupireddy
 wrote:
>
> On Wed, Jun 9, 2021 at 10:37 AM Peter Smith  wrote:
> >
[...]

I checked the v4* patches.
Everything applies and builds and tests OK for me.

> > 2.
> > + /* If connect option is supported, the others also need to be. */
> > + Assert(!IsSet(supported_opts, SUBOPT_CONNECT) ||
> > +(IsSet(supported_opts, SUBOPT_ENABLED) &&
> > + IsSet(supported_opts, SUBOPT_CREATE_SLOT) &&
> > + IsSet(supported_opts, SUBOPT_COPY_DATA)));
> >
> > This comment about "the others" doesn’t make sense to me.
> >
> > e.g. Why only these 3 options? What about all those other SUBOPT_* options?
>
> It is an existing Assert and comment for ensuring somebody doesn't
> call parse_subscription_options with SUBOPT_CONNECT, without
> SUBOPT_ENABLED, SUBOPT_CREATE_SLOT and SUBOPT_COPY_DATA. In other
> words, when SUBOPT_CONNECT is passed in, the other three options
> should also be passed. " the others" there in the comment makes sense
> just by looking at the Assert statement.
>

This misses the point of my question. And deducing the meaning of the
"the others" from the code is completely backwards! The comment
describes the code. The code doesn't describe the comment.

Again, I was asking why “the others” are only these 3 options?. What
about binary? What about streaming? What about refresh?
In other words - what was the *intent* of that comment, and does the
new code still meet the requirements of that intent? I think it does
not.

If you see github [1] when that code was first  implemented you can
see that “the others” referred to every option other than the
“connect”. At that time, the only others were those 3 - enabled,
create_slot, copy_data. But now there are lots more options so
something definitely needs to change.
E.g.
- Maybe the Assert now needs to include all the new options as well?
- Maybe the entire reason for the Assert has become redundant now due
to the introduction of SubOpts. It looks like it was not functional
code - just something to quieten a static analysis tool.
- Certainly “the others” is too vague and no longer has the original
meaning anymore

I don't know the answer; my guess is that all this has become obsolete
and the whole Assert and the dodgy comment can just be deleted.

> > 3.
> > I feel that this patch should be split into 2 parts
> > a) the SubOpts changes, and
> > b) the mutually exclusive options change.
>
> Divided the patch into two.
>
> > I agree that the new SubOpts struct etc. is an improvement over existing 
> > code.
> >
> > But, for the mutually exclusive options part I don't see what is
> > gained by the new patch code. I preferred the old code with its
> > multiple ereports. Although it was a bit repetitive IMO it was easier
> > to read that way, and length-wise there is almost no difference. So if
> > it is less readable and not a lot shorter then what is the benefit of
> > the change?
>
> I personally don't like the repeated code when there's a chance of
> doing it better. It might not reduce the loc, but it removes the many
> similar ereport(ERROR calls. PSA v4-0002 patch. I think the committer
> can take a call on it.
>

Thanks for splitting them. My votes are +1 for patch 0001 and  -1 for
patch 0002. As you say, someone else can decide.

--
[1] 
https://github.com/postgres/postgres/commit/b1ff33fd9bb82937f4719f264972e6a3c83cdb89#

Kind Regards,
Peter Smith
Fujitsu Australia.




Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-06-09 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jun 09, 2021 at 12:05:10PM -0400, Tom Lane wrote:
>> Here's a draft patch that renames regress_ecpg_user2 to ecpg2_regression,

> Using ecpg2_regression for the role goes a bit against the recent rule
> to not create any role not suffixed by "regress_" as part of the
> regression tests, but I am fine to live with that here.

Oh dear, I forgot to check that carefully.  I'd been thinking the rule was
that such names must *contain* "regress", but looking at user.c, it's
stricter:

#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
if (strncmp(stmt->role, "regress_", 8) != 0)
elog(WARNING, "roles created by regression test cases should 
have names starting with \"regress_\"");
#endif

Meanwhile, the rule for database names is:

#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
if (IsUnderPostmaster && strstr(dbname, "regression") == NULL)
elog(WARNING, "databases created by regression test cases 
should have names including \"regression\"");
#endif

So unless we want to relax one or both of those, we can't have a user
name that matches the database name.

Now I'm inclined to go back to the first-draft patch I had, which just
dropped the first problematic test case, and added gssencmode=disable
to the second one.

regards, tom lane




Re: Patch: Range Merge Join

2021-06-09 Thread David Rowley
On Thu, 10 Jun 2021 at 03:05, Thomas  wrote:
> We have implemented the Range Merge Join algorithm by extending the
> existing Merge Join to also support range conditions, i.e., BETWEEN-AND
> or @> (containment for range types).

It shouldn't be a blocker for you, but just so you're aware, there was
a previous proposal for this in [1] and a patch in [2].  I've include
Jeff here just so he's aware of this. Jeff may wish to state his
intentions with his own patch. It's been a few years now.

I only just glanced over the patch. I'd suggest getting rid of the /*
Thomas */ comments.  We use git, so if you need an audit trail about
changes then you'll find it in git blame.  If you have those for an
internal audit trail then you should consider using git. No committer
would commit those to PostgreSQL, so they might as well disappear.

For further review, please add the patch to the July commitfest [3].
We should be branching for pg15 sometime before the start of July.
There will be more focus on new patches around that time. Further
details in [4].

Also, I see this if your first post to this list, so welcome, and
thank you for the contribution.  Also, just to set expectations;
patches like this almost always take a while to get into shape for
PostgreSQL. Please expect a lot of requests to change things. That's
fairly standard procedure.  The process often drags on for months and
in some less common cases, years.

David

[1] 
https://www.postgresql.org/message-id/flat/6227.1334559170%40sss.pgh.pa.us#82c771950ba486dec911923a5e91
[2] 
https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com
[3] https://commitfest.postgresql.org/33/
[4] https://wiki.postgresql.org/wiki/CommitFest




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Amit Kapila
On Thu, Jun 10, 2021 at 4:13 AM Jeff Davis  wrote:
>
> On Wed, 2021-06-09 at 17:27 +0530, Amit Kapila wrote:
> > 2. In the main patch [1], we do send two_phase option even during
> > START_REPLICATION for the very first time when the two_phase can be
> > enabled. There are reasons as described in the worker.c why we can't
> > enable it during CREATE_REPLICATION_SLOT.
>
> I'll have to catch up on the thread to digest that reasoning and how it
> applies to decoding vs. replication. But there don't seem to be changes
> to START_REPLICATION for twophase, so I don't quite follow your point.
>

I think it is because we pass there as an option as I have suggested
doing in the case of CREATE_REPLICATION_SLOT.

> Are you saying that we should not be able to create slots with twophase
> at all until the rest of the changes go in?
>

No, the slots will be created but two_phase option will be enabled
only after the initial tablesync is complete.

> > Now, if we want to do
> > protocol changes, I wonder why only do some changes and leave the
> > rest
> > for the next version?
>
> I started this thread because it's possible to create a slot a certain
> way using the SQL function create_logical_replication_slot(), but it's
> impossible over the replication protocol. That seems inconsistent to
> me.
>

Right, I understand that but on the protocol side, there are few more
things to be considered to allow subscribers to enable two_phase.
However, maybe, for now, we can do it just for create_replication_slot
and the start_replication stuff required for subscribers can be done
later. I was not completely sure if that is a good idea.

-- 
With Regards,
Amit Kapila.




Re: alter table set TABLE ACCESS METHOD

2021-06-09 Thread Justin Pryzby
On Wed, Jun 09, 2021 at 01:45:52PM -0700, Zhihong Yu wrote:
> +   /* check if another access method change was already requested
> */
> +   if (tab->newAccessMethod)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("cannot change access method setting 
> twice")));
> 
> I think the error message can be refined - changing  access method twice is
> supported, as long as the two changes don't overlap.

That language is consistent wtih existing errors.

src/backend/commands/tablecmds.c:   
 errmsg("cannot change persistence setting twice")));
src/backend/commands/tablecmds.c:   
 errmsg("cannot change persistence setting twice")));
 errmsg("cannot alter type of column \"%s\" 
twice",

However, I think that SET TABLESPACE is a better template to follow:
 errmsg("cannot have multiple SET TABLESPACE 
subcommands")));

Michael pointed out that there's two, redundant checks:

+   if (rel->rd_rel->relam == amoid)
+   return;
+  
+   /* Save info for Phase 3 to do the real work */
+   if (OidIsValid(tab->newAccessMethod))
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("cannot have multiple SET ACCESS METHOD 
subcommands")));

I think that the "multiple subcommands" test should be before the "no-op" test.

@Jeff: In my original patch, I also proposed patches 2,3:

Subject: [PATCH v2 2/3] Allow specifying acccess method of partitioned tables..
Subject: [PATCH v2 3/3] Implement lsyscache get_rel_relam() 
 




Re: alter table set TABLE ACCESS METHOD

2021-06-09 Thread Justin Pryzby
On Wed, Jun 09, 2021 at 01:47:18PM +0900, Michael Paquier wrote:
> On Tue, Jun 08, 2021 at 05:33:31PM -0700, Jeff Davis wrote:
> > New version attached, with the detoasting code removed. Could use
> > another round of validation/cleanup in case I missed something during
> > the merge.
> 
> This looks rather sane to me, thanks.
> 
> /* Create the transient table that will receive the re-ordered data */
> OIDNewHeap = make_new_heap(tableOid, tableSpace,
> +  accessMethod
> It strikes me that we could extend CLUSTER/VACUUM FULL to support this
> option, in the same vein as TABLESPACE.  Perhaps that's not something to
> implement or have, just wanted to mention it.

It's a good thought.  But remember that that c5b28604 handled REINDEX
(TABLESPACE) but not CLUSTER/VACUUM FULL (TABLESPACE).  You wrote:
https://www.postgresql.org/message-id/YBuWbzoW6W7AaMv0%40paquier.xyz
> Regarding the VACUUM and CLUSTER cases, I am not completely sure if
> going through these for a tablespace case is the best move we can do,
> as ALTER TABLE is able to mix multiple operations and all of them
> require already an AEL to work.  REINDEX was different thanks to the
> case of CONCURRENTLY.  Anyway, as a lot of work has been done here
> already, I would recommend to create new threads about those two
> topics.  I am also closing this patch in the CF app.

In any case, I think we really want to have an ALTER .. SET ACCESS METHOD.
Supporting it also in CLUSTER/VACUUM is an optional, additional feature.

-- 
Justin




Re: Move pg_attribute.attcompression to earlier in struct for reduced size?

2021-06-09 Thread Michael Paquier
On Tue, Jun 08, 2021 at 11:32:21PM -0400, Tom Lane wrote:
> I can imagine sometime in the future where we need to get rid of all
> instances of pglz so we can reassign that compression code to something
> else.  But would we insist on a mass VACUUM FULL to make that happen?
> Doubt it.  You'd want a tool that would make that happen over time,
> in the background; like the mechanisms that have been discussed for
> enabling checksums on-the-fly.

Well, I can imagine that some people could afford being more
aggressive here even if it implies some downtime and if they are not
willing to afford the deployment of a second instance for a
dump/restore or a logirep setup.

(The parallel with data checksums is partially true, as you can do a
switch of checksums with physical replication as the page's checksums
are only written when pushed out of shared buffers, not when they are
written into WAL.  This needs a second instance, of course.)

> In the meantime I'm +1 for dropping this logic from VACUUM FULL.
> I don't even want to spend enough more time on it to confirm the
> different overhead measurements that have been reported.

Agreed.  It looks like we are heading toward doing just that for this
release.
--
Michael


signature.asc
Description: PGP signature


Re: RFC: Logging plan of the running query

2021-06-09 Thread torikoshia

On 2021-06-09 23:04, Fujii Masao wrote:

Thanks for your review!


auto_explain can log the plan of even nested statement
if auto_explain.log_nested_statements is enabled.
But ISTM that pg_log_current_plan() cannot log that plan.
Is this intentional?



I think that it's better to make pg_log_current_plan() log
the plan of even nested statement.


+1. It would be better.
But currently plan information is got from ActivePortal and ISTM there 
are no easy way to retrieve plan information of nested statements from 
ActivePortal.

Anyway I'll do some more research.


I think you are right about the following comments.
I'll fix them.


+   es->format = EXPLAIN_FORMAT_TEXT;
+   es->settings = true;

Since pg_log_current_plan() is usually used to investigate and
trouble-shoot the long running queries, IMO it's better to
enable es->verbose by default and get additional information
about the queries. Thought?
+ * pg_log_current_plan
+ * Signal a backend process to log plan the of running query.

"plan the of" is typo?


+ * Only superusers are allowed to signal to log plan because any users 
to
+ * issue this request at an unbounded rate would cause lots of log 
messages

+ * and which can lead to denial of service.

"because any users" should be "because allowing any users"
like the comment for pg_log_backend_memory_contexts()?


+ * All the actual work is deferred to ProcessLogExplainInterrupt(),

"ProcessLogExplainInterrupt()" should be 
"ProcessLogCurrentPlanInterrupt()"?


Regards,


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread Kyotaro Horiguchi
At Tue, 8 Jun 2021 08:45:24 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Kyotaro Horiguchi 
> > I think the discussion is based the behavior that any process that is
> > responsible for finishing the 2pc-commit continue retrying remote
> > commits until all of the remote-commits succeed.
> 
> Thank you for coming back.  We're talking about the first attempt to prepare 
> and commit in each transaction, not the retry case.

If we accept each elementary-commit (via FDW connection) to fail, the
parent(?) there's no way the root 2pc-commit can succeed.  How can we
ignore the fdw-error in that case?

> > > Throws: HeuristicMixedException
> > > Thrown to indicate that a heuristic decision was made and that some
> > relevant updates have been
> > > committed while others have been rolled back.
> 
> > I'm not sure about how JTA works in detail, but doesn't
> > UserTransaction.commit() return HeuristicMixedExcpetion when some of
> > relevant updates have been committed but other not? Isn't it the same
> > state with the case where some of the remote servers failed on
> > remote-commit while others are succeeded?
> 
> No.  Taking the description literally and considering the relevant XA 
> specification, it's not about the remote commit failure.  The remote server 
> is not allowed to fail the commit once it has reported successful prepare, 
> which is the contract of 2PC.  HeuristicMixedException is about the manual 
> resolution, typically by the DBA, using the DBMS-specific tool or the 
> standard commit()/rollback() API.

Mmm. The above seems as if saying that 2pc-comit does not interact
with remotes.  The interface contract does not cover everything that
happens in the real world. If remote-commit fails, that is just an
issue outside of the 2pc world.  In reality remote-commit may fail for
all reasons.

https://www.ibm.com/docs/ja/db2-for-zos/11?topic=support-example-distributed-transaction-that-uses-jta-methods

>  }  catch (javax.transaction.xa.XAException xae)
>  { // Distributed transaction failed, so roll it back.
>// Report XAException on prepare/commit.

This suggests that both XAResoruce.prepare() and commit() can throw a
exception.

> > (I guess that
> > UserTransaction.commit() would throw RollbackException if
> > remote-prepare has been failed for any of the remotes.)
> 
> Correct.

So UserTransaction.commit() does not throw the same exception if
remote-commit fails.  Isn't the HeuristicMixedExcpetion the exception
thrown in that case?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: alter table set TABLE ACCESS METHOD

2021-06-09 Thread Michael Paquier
On Wed, Jun 09, 2021 at 01:45:52PM -0700, Zhihong Yu wrote:
> +   /* check if another access method change was already requested
> */
> +   if (tab->newAccessMethod)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +errmsg("cannot change access method setting
> twice")));
> 
> I think the error message can be refined - changing  access method twice is
> supported, as long as the two changes don't overlap.

Hmm.  Do we actually need this one?  ATPrepSetAccessMethod() checks
already that this command cannot be specified multiple times, with an
error message consistent with what SET TABLESPACE does.
--
Michael


signature.asc
Description: PGP signature


Re: please update ps display for recovery checkpoint

2021-06-09 Thread Michael Paquier
On Sun, Jun 06, 2021 at 09:13:48PM -0500, Justin Pryzby wrote:
> Putting this into fd.c seems to assume that we can clobber "ps", which is fine
> when called by StartupXLOG(), but it's a public interface, so I'm not sure if
> it's okay to assume that's the only caller.  Maybe it should check if
> MyAuxProcType == B_STARTUP.

I would be tempted to just add that into StartupXLOG() rather than
implying that callers of SyncDataDirectory() are fine to get their ps
output enforced all the time.
--
Michael


signature.asc
Description: PGP signature


Re: please update ps display for recovery checkpoint

2021-06-09 Thread Michael Paquier
On Mon, Jun 07, 2021 at 01:28:06PM -0400, Robert Haas wrote:
> On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan  wrote:
>> I've seen a few functions cause lengthy startups, including
>> SyncDataDirectory() (for which I was grateful to see 61752afb),
>> StartupReorderBuffer(), and RemovePgTempFiles().  I like the idea of
>> adding additional information in the ps title, but I also think it is
>> worth exploring additional ways to improve on these O(n) startup
>> tasks.

+1.  I also agree with doing something for the ps output of the
startup process when these are happening in crash recovery.

> See also the nearby thread entitled "when the startup process doesn't"
> which touches on this same issue.

Here is a link to the thread:
https://www.postgresql.org/message-id/ca+tgmoahqrgdfobwgy16xcomtxxsrvgfb2jncvb7-ubuee1...@mail.gmail.com
--
Michael


signature.asc
Description: PGP signature


a path towards replacing GEQO with something better

2021-06-09 Thread John Naylor
Hi,

On occasion it comes up that the genetic query optimizer (GEQO) doesn't
produce particularly great plans, and is slow ([1] for example). The only
alternative that has gotten as far as a prototype patch (as far as I know)
is simulated annealing some years ago, which didn't seem to get far.

The join problem as it pertains to Postgres has been described within the
community in
[Gustaffson, 2017] and [Stroffek & Kovarik, 2007].

The fact that there is much more interest than code in this area indicates
that this is a hard problem. I hadn't given it much thought myself until by
chance I came across [Neumann, 2018], which describes a number of
interesting ideas. The key takeaway is that you want a graceful transition
between exhaustive search and heuristic search. In other words, if the
space of possible join orderings is just slightly larger than the maximum
allowed exhaustive search, then the search should be *almost*
exhaustive. This not only increases the chances of finding a good plan, but
also has three engineering advantages I can think of:

1) It's natural to re-use data structures etc. already used by the existing
dynamic programming (DP) algorithm. Framing the problem as extending DP
greatly lowers the barrier to making realistic progress. If the problem is
framed as "find an algorithm as a complete drop-in replacement for GEQO",
it's a riskier project in my view.

2) We can still keep GEQO around (with some huge limit by default) for a
few years as an escape hatch, while we refine the replacement. If there is
some bug that prevents finding a plan, we can emit a WARNING and fall back
to GEQO. Or if a user encounters a regression in a big query, they can
lower the limit to restore the plan they had in an earlier version.

3) It actually improves the existing exhaustive search, because the
complexity of the join order problem depends on the query shape: a "chain"
shape (linear) vs. a "star" shape (as in star schema), for the most common
examples. The size of the DP table grows like this (for n >= 4):

Chain: (n^3 - n) / 6   (including bushy plans)
Star:  (n - 1) * 2^(n - 2)

 n  chain   star

 4 10 12
 5 20 32
 6 35 80
 7 56192
 8 84448
 9120   1024
10165   2304
11220   5120
12286  11264
13364  24576
14455  53248
15560 114688
...
64  43680 290536219160925437952

The math behind this is described in detail in [Ono & Lohman, 1990]. I
verified this in Postgres by instrumenting the planner to count how many
times it calls make_join_rel().

Imagine having a "join enumeration budget" that, if exceeded, prevents
advancing to the next join level. Given the above numbers and a query with
some combination of chain and star shapes, a budget of 400 can guarantee an
exhaustive search when there are up to 8 relations. For a pure chain join,
we can do an exhaustive search on up to 13 relations, for a similar cost of
time and space. Out of curiosity I tested HEAD with a chain query having 64
tables found in the SQLite tests [2] and found exhaustive search to take
only twice as long as GEQO. If we have some 30-way join, and some (> 400)
budget, it's actually possible that we will complete the exhaustive search
and get the optimal plan. This is a bottom-up way of determining the
complexity. Rather than configuring a number-of-relations threshold
and possibly have exponential behavior blow up in their faces, users can
configure something that somewhat resembles the runtime cost.

Now, I'll walk through one way that a greedy heuristic can integrate with
DP. In our 30-way join example, if we use up our budget and don't have a
valid plan, we'll break out of DP at the last join level we completed.
Since we already have built a number of joinrels, we build upon that work
as we proceed. The approach I have in mind is described in [Kossmann &
Stocker, 2000], which the authors call "iterative dynamic programming"
(IDP). I'll describe one of the possible variants here. Let's say we only
got as far as join level 8, so we've created up to 8-way joinrels. We pick
the best few (maybe 5%) of these 8-way joinrels by some measure (doesn't
have to be the full cost model) and on top of each of them create a full
plan quickly: At each join level, we only pick one base relation (again by
some measure) to create one new joinrel and then move to the next join
level. This is very fast, even with hundreds of relations.

Once we have one valid, complete plan, we can technically stop at any time
(Coding this much is also a good proof-of-concept). How much additional
effort we expend to find a good plan could be another budget we have.  With
a complete plan obtained quickly, we also have an upper bound on the
measure of the cheapest overall plan, so with that we can prune any more
expensive plans as we iterate through the 8-way joinrels. Once we have a
set of complete plans, we pick some of them to 

Re: Race condition in recovery?

2021-06-09 Thread Kyotaro Horiguchi
At Wed, 09 Jun 2021 19:09:54 -0400, Tom Lane  wrote in 
> Robert Haas  writes:
> > Got it. I have now committed the patch to all branches, after adapting
> > your changes just a little bit.
> > Thanks to you and Kyotaro-san for all the time spent on this. What a slog!
> 
> conchuela failed its first encounter with this test case:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2021-06-09%2021%3A12%3A25
> 
> That machine has a certain, er, history of flakiness; so this may
> not mean anything.  Still, we'd better keep an eye out to see if
> the test needs more stabilization.

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=conchuela=2021-06-09%2021%3A12%3A25=recovery-check

> ==~_~===-=-===~_~== 
> pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_cascade.log
>  ==~_~===-=-===~_~==

> 2021-06-09 23:31:10.439 CEST [893820:1] LOG:  started streaming WAL from 
> primary at 0/200 on timeline 1
> 2021-06-09 23:31:10.439 CEST [893820:2] FATAL:  could not receive data from 
> WAL stream: ERROR:  requested WAL segment 00010002 has 
> already been removed

The script 025_stuck_on_olde_timeline.pl (and I) forgets to set
wal_keep_size(segments).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
index 0d96bb3c15..25c2dff437 100644
--- a/src/test/recovery/t/025_stuck_on_old_timeline.pl
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -27,6 +27,7 @@ $perlbin =~ s{\\}{}g if ($TestLib::windows_os);
 my $archivedir_primary = $node_primary->archive_dir;
 $node_primary->append_conf('postgresql.conf', qq(
 archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+wal_keep_size=128MB
 ));
 $node_primary->start;
 
diff --git a/src/test/recovery/t/025_stuck_on_old_timeline.pl b/src/test/recovery/t/025_stuck_on_old_timeline.pl
index 0d96bb3c15..8099571299 100644
--- a/src/test/recovery/t/025_stuck_on_old_timeline.pl
+++ b/src/test/recovery/t/025_stuck_on_old_timeline.pl
@@ -27,6 +27,7 @@ $perlbin =~ s{\\}{}g if ($TestLib::windows_os);
 my $archivedir_primary = $node_primary->archive_dir;
 $node_primary->append_conf('postgresql.conf', qq(
 archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
+wal_keep_segments=8
 ));
 $node_primary->start;
 


Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-06-09 Thread Michael Paquier
On Wed, Jun 09, 2021 at 12:05:10PM -0400, Tom Lane wrote:
> Here's a draft patch that renames regress_ecpg_user2 to ecpg2_regression,
> which matches the name of one of the databases used, allowing the test
> cases with defaulted database name to succeed.  That gets rid of one of
> the problematic diffs.

Yeah, I agree that this does not matter much for this one, as we want
to stress the quotes and the grammar for the connections here, as
99a5619 implies.  It is good to check for the failure path as well, so
what you have here looks fine to me.

> As it stood, though, that meant that connect/test5
> wasn't exercising the connection-failure code path at all, which didn't
> seem like what we want.  So I adjusted the second place that had been
> failing to again fail on no-such-database, and stuck in gssencmode=disable
> so that we shouldn't get any test diff on hamerkop.

Using ecpg2_regression for the role goes a bit against the recent rule
to not create any role not suffixed by "regress_" as part of the
regression tests, but I am fine to live with that here.

The changes for test1 with MinGW look right, I have not been able to
test them.
--
Michael


signature.asc
Description: PGP signature


Re: Race condition in recovery?

2021-06-09 Thread Tom Lane
Robert Haas  writes:
> Got it. I have now committed the patch to all branches, after adapting
> your changes just a little bit.
> Thanks to you and Kyotaro-san for all the time spent on this. What a slog!

conchuela failed its first encounter with this test case:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2021-06-09%2021%3A12%3A25

That machine has a certain, er, history of flakiness; so this may
not mean anything.  Still, we'd better keep an eye out to see if
the test needs more stabilization.

regards, tom lane




Re: unnesting multirange data types

2021-06-09 Thread Jonathan S. Katz
On 6/9/21 4:56 PM, Alvaro Herrera wrote:
> On 2021-Jun-09, Jonathan S. Katz wrote:
> 
>> I did a couple more tests around this.
>>
>> As suspected, in PL/pgSQL, there is no way to unpack or iterate over a
>> multirange type.
> 
> Uh.  This is disappointing; the need for some way to unnest or unpack a
> multirange was mentioned multiple times in the range_agg thread.  I had
> assumed that there was some way to cast the multirange to a range array,
> or somehow convert it, but apparently that doesn't work.

Just to be pedantic with examples:

  SELECT datemultirange(
daterange(current_date, current_date + 2),
daterange(current_date + 5, current_date + 7))::daterange[];

  ERROR:  cannot cast type datemultirange to daterange[]
  LINE 1: ...2), daterange(current_date + 5, current_date + 7))::daterang...

IF there was an array to cast it into an array, we could then use the
array looping construct in PL/pgSQL, but if we could only choose one, I
think it'd be more natural/less verbose to have an "unnest".

> If the supporting pieces are mostly there, then I opine we should add
> something.

Agreed.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Estimating HugePages Requirements?

2021-06-09 Thread Mark Dilger



> On Jun 9, 2021, at 1:52 PM, Bossart, Nathan  wrote:
> 
> I'd be happy to clean it up and submit it for
> discussion in pgsql-hackers@ if there is interest.

Yes, I'd like to see it.  Thanks for offering.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







BF assertion failure on mandrill in walsender, v13

2021-06-09 Thread Thomas Munro
Hi,

Not sure if there is much chance of debugging this one-off failure in
without a backtrace (long shot: any chance there's still a core
file?), but for the record: mandrill choked on a null pointer passed
to GetMemoryChunkContext() inside a walsender running logical
replication.  Possibly via pfree(NULL), but there are other paths.
That's an animal running with force_parallel_mode and
RANDOMIZE_ALLOCATED_MEMORY, on AIX with IBM compiler in 32 bit mode,
so unusual in several ways.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2021-06-06%2015:37:23




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Tom Lane
Jeff Davis  writes:
> On Wed, 2021-06-09 at 16:50 +0530, Amit Kapila wrote:
>> BTW, can't we consider it to be part of
>> create_slot_opt_list?

> Yes, that would be better. It looks like the physical and logical slot
> options are mixed together -- should they be separated in the grammar
> so that using an option with the wrong kind of slot would be a parse
> error?

That sort of parse error is usually pretty unfriendly to users who
may not quite remember which options are for what; all they'll get
is "syntax error" which won't illuminate anything.  I'd rather let
the grammar accept both, and throw an appropriate error further
downstream.

regards, tom lane




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Jeff Davis
On Wed, 2021-06-09 at 17:27 +0530, Amit Kapila wrote:
> 2. In the main patch [1], we do send two_phase option even during
> START_REPLICATION for the very first time when the two_phase can be
> enabled. There are reasons as described in the worker.c why we can't
> enable it during CREATE_REPLICATION_SLOT. 

I'll have to catch up on the thread to digest that reasoning and how it
applies to decoding vs. replication. But there don't seem to be changes
to START_REPLICATION for twophase, so I don't quite follow your point.

Are you saying that we should not be able to create slots with twophase
at all until the rest of the changes go in?

> Now, if we want to do
> protocol changes, I wonder why only do some changes and leave the
> rest
> for the next version?

I started this thread because it's possible to create a slot a certain
way using the SQL function create_logical_replication_slot(), but it's
impossible over the replication protocol. That seems inconsistent to
me.

Regards,
Jeff Davis






Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Jeff Davis
On Wed, 2021-06-09 at 16:50 +0530, Amit Kapila wrote:
> BTW, can't we consider it to be part of
> create_slot_opt_list?

Yes, that would be better. It looks like the physical and logical slot
options are mixed together -- should they be separated in the grammar
so that using an option with the wrong kind of slot would be a parse
error?

Regards,
Jeff Davis






Re: Race condition in recovery?

2021-06-09 Thread Robert Haas
On Wed, Jun 9, 2021 at 4:07 AM Dilip Kumar  wrote:
> Reason for the problem was that the "-Xnone" parameter was not
> accepted by "sub backup" in PostgresNode.pm so I created that for
> backpatch.  With attached patches I am to make it pass in v12,v11,v10
> (with fix) and fail (without fix).  However, we will have to make some
> change for 9.6 because pg_basebackup doesn't support -Xnone on 9.6,
> maybe we can delete the content from pg_wal after the backup, if we
> think that approach looks fine then I will make the changes for 9.6 as
> well.

Got it. I have now committed the patch to all branches, after adapting
your changes just a little bit.

Thanks to you and Kyotaro-san for all the time spent on this. What a slog!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: unnesting multirange data types

2021-06-09 Thread Alvaro Herrera
On 2021-Jun-09, Jonathan S. Katz wrote:

> I did a couple more tests around this.
> 
> As suspected, in PL/pgSQL, there is no way to unpack or iterate over a
> multirange type.

Uh.  This is disappointing; the need for some way to unnest or unpack a
multirange was mentioned multiple times in the range_agg thread.  I had
assumed that there was some way to cast the multirange to a range array,
or somehow convert it, but apparently that doesn't work.

If the supporting pieces are mostly there, then I opine we should add
something.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)




Re: pg_stat_progress_create_index vs. parallel index builds

2021-06-09 Thread Alvaro Herrera
On 2021-Jun-04, Greg Nancarrow wrote:

> I tested with and without the patch, using the latest PG14 source as
> of today, and can confirm that without the patch applied, the "sorting
> live tuples" phase is not reported in the parallel-case, but with the
> patch applied it then does get reported in that case. I also confirmed
> that, as you said, the patch only addresses the usual case where the
> parallel leader participates in the parallel operation.
> What is slightly puzzling to me (and perhaps digging deeper will
> reveal it) is why this "sorting live tuples" phase seems so short in
> the serial case compared to the parallel case?
> For example, in my test I created an index on a column of a table
> having 10 million records, and it took about 40 seconds, during which
> the "sorting live tuples" phase seemed to take about 8 seconds. Yet
> for the serial case, index creation took about 75 seconds, during
> which the "sorting live tuples" phase seemed to take about 1 second.

I think the reason is that scanning the table is not just scanning the
table -- it is also feeding tuples to tuplesort, which internally is
already sorting them as it receives them.  So by the time you're done
scanning the relation, some (large) fraction of the sorting work is
already done, which is why the "sorting" phase is so short.


Tracing sort is not easy.  we discussed this earlier; see
https://postgr.es/m/20181218210159.xtkltzm7flrwsm55@alvherre.pgsql
for example.

-- 
Álvaro Herrera   Valdivia, Chile
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)




Re: Estimating HugePages Requirements?

2021-06-09 Thread Bossart, Nathan
moving to pgsql-hackers@

On 6/9/21, 9:41 AM, "Don Seiler"  wrote:
> I'm trying to set up a chef recipe to reserve enough HugePages on a
> linux system for our PG servers. A given VM will only host one PG
> cluster and that will be the only thing on that host that uses
> HugePages. Blogs that I've seen suggest that it would be as simple
> as taking the shared_buffers setting and dividing that by 2MB (huge
> page size), however I found that I needed some more.
>
> In my test case, shared_buffers is set to 4003MB (calculated by
> chef) but PG failed to start until I reserved a few hundred more MB.
> When I checked VmPeak, it was 4321MB, so I ended up having to
> reserve over 2161 huge pages, over a hundred more than I had
> originally thought.
>
> I'm told other factors contribute to this additional memory
> requirement, such as max_connections, wal_buffers, etc. I'm
> wondering if anyone has been able to come up with a reliable method
> for determining the HugePages requirements for a PG cluster based on
> the GUC values (that would be known at deployment time).

In RDS, we've added a pg_ctl option that returns the amount of shared
memory required.  Basically, we start up postmaster just enough to get
an accurate value from CreateSharedMemoryAndSemaphores() and then shut
down.  The patch is quite battle-tested at this point (we first
started using it in 2017, and we've been enabling huge pages by
default since v10).  I'd be happy to clean it up and submit it for
discussion in pgsql-hackers@ if there is interest.

Nathan



Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-09 Thread Peter Geoghegan
On Wed, Jun 9, 2021 at 11:45 AM Andres Freund  wrote:
> Good find!

+1

> > The attached patch fixes this inconsistency
>
> I think I prefer applying the fix and the larger changes separately.

I wonder if it's worth making the goto inside lazy_scan_prune verify
that the heap tuple matches what we expect. I'm sure that we would
have found this issue far sooner if that had been in place already.
Though I'm less sure of how much value adding such a check has now.

-- 
Peter Geoghegan




Re: alter table set TABLE ACCESS METHOD

2021-06-09 Thread Zhihong Yu
On Wed, Jun 9, 2021 at 12:31 PM Jeff Davis  wrote:

> On Wed, 2021-06-09 at 13:47 +0900, Michael Paquier wrote:
> > There is a mix of upper and lower-case characters here.  It could be
> > more consistent.  It seems to me that this test should actually check
> > that pg_class.relam reflects the new value.
>
> Done. I also added a (negative) test for changing the AM of a
> partitioned table, which wasn't caught before.
>
> > +errmsg("cannot have multiple SET ACCESS METHOD
> > subcommands")));
> > Worth adding a test?
>
> Done.
>
> > Nit: the name of the variable looks inconsistent with this comment.
> > The original is weird as well.
>
> Tried to improve wording.
>
> > I am wondering if it would be a good idea to set the new tablespace
> > and new access method fields to InvalidOid within
> > ATGetQueueEntry.  We
> > do that for the persistence.  Not critical at all, still..
>
> Done.
>
> > +   pass = AT_PASS_MISC;
> > Maybe add a comment here?
>
> Done. In that case, it doesn't matter because there's no work to be
> done in Phase 2.
>
> Regards,
> Jeff Davis
>
> Hi,

+   /* check if another access method change was already requested
*/
+   if (tab->newAccessMethod)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot change access method setting
twice")));

I think the error message can be refined - changing  access method twice is
supported, as long as the two changes don't overlap.

Cheers


Re: Adjust pg_regress output for new long test names

2021-06-09 Thread Robert Haas
On Wed, Jun 9, 2021 at 1:37 PM Peter Eisentraut
 wrote:
> Can we scan all the test names first and then pick a suitable length?

FWIW, I think this discussion of shortening the test case names is
probably going in the wrong direction. It's true that in many cases
such a thing can be done, but it's also true that the test case
authors picked those names because they felt that they described those
test cases well. It's not unlikely that future test case authors will
have similar feelings and will again pick names that are a little bit
longer. It's also not impossible that in shortening the names we will
make them less clear. For example, Peter said that "partition" was
redundant in something like "detach-partition-concurrently-4," but
that is only true if you think that a partition is the only thing that
can be detached. That is true today as far as the SQL grammar is
concerned, but from a source code perspective we speak of detaching
from shm_mq objects or DSMs, and there could be more things, internal
or SQL-visible, in the future.

Now I don't care all that much; this isn't worth getting worked up
about. But if it were me, I'd tend to err in the direction of
accommodating longer test names, and only shorten them if it's clear
that someone *really* went overboard.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: unnesting multirange data types

2021-06-09 Thread Jonathan S. Katz
On 6/9/21 3:44 PM, Jonathan S. Katz wrote:
> On 6/9/21 3:25 PM, Tom Lane wrote:
>> "Jonathan S. Katz"  writes:
>>> I would like to decompose the returned multirange into its individual
>>> ranges, similarly to how I would "unnest" an array:
>>
>> +1 for adding such a feature, but I suppose it's too late for v14.
> 
> Well, the case I would make for v14 is that, as of right now, the onus
> is on the driver writers / application developers to be able to unpack
> the multiranges.
> 
> I haven't tried manipulating a multirange in a PL like Python, maybe
> some exploration there would unveil more or less pain, or if it could be
> iterated over in PL/pgSQL (I'm suspecting no).

I did a couple more tests around this.

As suspected, in PL/pgSQL, there is no way to unpack or iterate over a
multirange type.

In PL/Python, both range types and multirange types are treated as
strings. From there, you can at least ultimately parse and manipulate it
into your preferred Python types, but this goes back to my earlier point
about putting the onus on the developer to do so.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: unnesting multirange data types

2021-06-09 Thread Jonathan S. Katz
On 6/9/21 3:25 PM, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
>> I would like to decompose the returned multirange into its individual
>> ranges, similarly to how I would "unnest" an array:
> 
> +1 for adding such a feature, but I suppose it's too late for v14.

Well, the case I would make for v14 is that, as of right now, the onus
is on the driver writers / application developers to be able to unpack
the multiranges.

Maybe it's not terrible as of this moment -- I haven't tried testing it
that far yet -- but it may make it a bit more challenging to work with
these types outside of Postgres. I recall a similar issue when initially
trying to integrate range types into my apps back in the v9.2 days, and
I ended up writing some grotty code to handle it. Yes, I worked around
it, but I preferably wouldn't have had to.

An "unnest" at least lets us bridge the gap a bit, i.e. if you really
need to introspect a multirange type, you have a way of getting it into
a familiar format.

I haven't tried manipulating a multirange in a PL like Python, maybe
some exploration there would unveil more or less pain, or if it could be
iterated over in PL/pgSQL (I'm suspecting no).

That all said, for writing queries within Postgres, the multiranges make
a lot of operations way easier. I do think a missing "unnest" function
does straddle the line of "omission" and "new feature," so I can
understand if it does not make it into v14.

> AFAICS, "unnest(anymultirange) returns setof anyrange" could coexist
> alongside the existing variants of unnest(), so I don't see any
> fundamental stumbling block to having it.

Cool. I was initially throwing out "unnest" as the name as it mirrors
what we currently have with arrays, and seems to be doing something
similar. Open to other names, but this was the one that I was drawn to.
"multirange" is an "ordered array of ranges" after all.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Add PortalDrop in exec_execute_message

2021-06-09 Thread Alvaro Herrera
On 2021-Jun-09, Tom Lane wrote:

> I wrote:
> > It turns out that the problem is specific to SELECT FOR UPDATE, and
> > it happens because nodeLockRows is not careful to shut down the
> > EvalPlanQual mechanism it uses before returning NULL at the end of
> > a scan.  If EPQ has been fired, it'll be holding a tuple slot
> > referencing whatever tuple it was last asked about.  The attached
> > trivial patch seems to take care of the issue nicely, while adding
> > little if any overhead.  (A repeat call to EvalPlanQualEnd doesn't
> > do much.)

Thanks for researching that -- good find.

> BTW, to be clear: I think Alvaro's change is also necessary.
> If libpq is going to silently do something different in pipeline
> mode than it does in normal mode, it should strive to minimize
> the semantic difference.  exec_simple_query includes a PortalDrop,
> so we'd best do the same in pipeline mode.  But this LockRows
> misbehavior would be visible in other operating modes anyway.

Agreed.  I'll get it pushed after the patch I'm currently looking at.

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: alter table set TABLE ACCESS METHOD

2021-06-09 Thread Jeff Davis
On Wed, 2021-06-09 at 13:47 +0900, Michael Paquier wrote:
> There is a mix of upper and lower-case characters here.  It could be
> more consistent.  It seems to me that this test should actually check
> that pg_class.relam reflects the new value.

Done. I also added a (negative) test for changing the AM of a
partitioned table, which wasn't caught before.

> +errmsg("cannot have multiple SET ACCESS METHOD
> subcommands")));
> Worth adding a test?

Done.

> Nit: the name of the variable looks inconsistent with this comment.
> The original is weird as well.

Tried to improve wording.

> I am wondering if it would be a good idea to set the new tablespace
> and new access method fields to InvalidOid within
> ATGetQueueEntry.  We
> do that for the persistence.  Not critical at all, still..

Done.

> +   pass = AT_PASS_MISC;
> Maybe add a comment here?

Done. In that case, it doesn't matter because there's no work to be
done in Phase 2.

Regards,
Jeff Davis

From 051d067e07eaec29d221641da3bf28d0dd0731c8 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 5 May 2021 14:28:59 -0700
Subject: [PATCH] ALTER TABLE ... SET ACCESS METHOD

Adds support for changing the access method of a table with a
rewrite.

Author: Justin Pryzby, Jeff Davis
---
 doc/src/sgml/ref/alter_table.sgml   | 20 +++
 src/backend/commands/cluster.c  | 21 +---
 src/backend/commands/matview.c  |  5 +-
 src/backend/commands/tablecmds.c| 71 +++--
 src/backend/parser/gram.y   |  8 +++
 src/bin/psql/tab-complete.c | 10 ++--
 src/include/commands/cluster.h  |  4 +-
 src/include/commands/event_trigger.h|  1 +
 src/include/nodes/parsenodes.h  |  1 +
 src/test/regress/expected/create_am.out | 34 
 src/test/regress/sql/create_am.sql  | 21 
 11 files changed, 178 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 939d3fe2739..5ac13a5c0f6 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -75,6 +75,7 @@ ALTER TABLE [ IF EXISTS ] name
 CLUSTER ON index_name
 SET WITHOUT CLUSTER
 SET WITHOUT OIDS
+SET ACCESS METHOD new_access_method
 SET TABLESPACE new_tablespace
 SET { LOGGED | UNLOGGED }
 SET ( storage_parameter [= value] [, ... ] )
@@ -693,6 +694,16 @@ WITH ( MODULUS numeric_literal, REM
 

 
+   
+SET ACCESS METHOD
+
+ 
+  This form changes the access method of the table by rewriting it. See
+   for more information.
+ 
+
+   
+

 SET TABLESPACE
 
@@ -1229,6 +1240,15 @@ WITH ( MODULUS numeric_literal, REM
   
  
 
+ 
+  new_access_method
+  
+   
+The name of the access method to which the table will be converted.
+   
+  
+ 
+
  
   new_tablespace
   
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 6487a9e3fcb..b3d8b6deb03 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -576,6 +576,7 @@ static void
 rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Oid			accessMethod = OldHeap->rd_rel->relam;
 	Oid			tableSpace = OldHeap->rd_rel->reltablespace;
 	Oid			OIDNewHeap;
 	char		relpersistence;
@@ -597,6 +598,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 
 	/* Create the transient table that will receive the re-ordered data */
 	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   accessMethod,
 			   relpersistence,
 			   AccessExclusiveLock);
 
@@ -618,16 +620,16 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 /*
  * Create the transient table that will be filled with new data during
  * CLUSTER, ALTER TABLE, and similar operations.  The transient table
- * duplicates the logical structure of the OldHeap, but is placed in
- * NewTableSpace which might be different from OldHeap's.  Also, it's built
- * with the specified persistence, which might differ from the original's.
+ * duplicates the logical structure of the OldHeap; but will have the
+ * specified physical storage properties NewTableSpace, NewAccessMethod, and
+ * relpersistence.
  *
  * After this, the caller should load the new heap with transferred/modified
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
-			  LOCKMODE lockmode)
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
+			  char relpersistence, LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
 	char		NewHeapName[NAMEDATALEN];
@@ -686,7 +688,7 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 		  InvalidOid,
 		  InvalidOid,
 		  OldHeap->rd_rel->relowner,
-		  OldHeap->rd_rel->relam,
+		

Re: unnesting multirange data types

2021-06-09 Thread Tom Lane
"Jonathan S. Katz"  writes:
> I would like to decompose the returned multirange into its individual
> ranges, similarly to how I would "unnest" an array:

+1 for adding such a feature, but I suppose it's too late for v14.

AFAICS, "unnest(anymultirange) returns setof anyrange" could coexist
alongside the existing variants of unnest(), so I don't see any
fundamental stumbling block to having it.

regards, tom lane




Re: Add PortalDrop in exec_execute_message

2021-06-09 Thread Tom Lane
I wrote:
> It turns out that the problem is specific to SELECT FOR UPDATE, and
> it happens because nodeLockRows is not careful to shut down the
> EvalPlanQual mechanism it uses before returning NULL at the end of
> a scan.  If EPQ has been fired, it'll be holding a tuple slot
> referencing whatever tuple it was last asked about.  The attached
> trivial patch seems to take care of the issue nicely, while adding
> little if any overhead.  (A repeat call to EvalPlanQualEnd doesn't
> do much.)

BTW, to be clear: I think Alvaro's change is also necessary.
If libpq is going to silently do something different in pipeline
mode than it does in normal mode, it should strive to minimize
the semantic difference.  exec_simple_query includes a PortalDrop,
so we'd best do the same in pipeline mode.  But this LockRows
misbehavior would be visible in other operating modes anyway.

regards, tom lane




Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-09 Thread Andres Freund
Hi,

Good find!

On 2021-06-09 17:42:34 +0200, Matthias van de Meent wrote:
> I believe that I've found the culprit:
> GetOldestNonRemovableTransactionId(rel) does not use the exact same
> conditions for returning OldestXmin as GlobalVisTestFor(rel) does.
> This results in different minimal XIDs, and subsequently this failure.

Specifically, the issue is that it uses the innocuous looking

else if (RelationIsAccessibleInLogicalDecoding(rel))
return horizons.catalog_oldest_nonremovable;

but that's not sufficient, because

#define RelationIsAccessibleInLogicalDecoding(relation) \
(XLogLogicalInfoActive() && \
 RelationNeedsWAL(relation) && \
 (IsCatalogRelation(relation) || 
RelationIsUsedAsCatalogTable(relation)))

it is never true if wal_level < logical. So what it is missing is the
IsCatalogRelation(rel) || bit.


> The attached patch fixes this inconsistency

I think I prefer applying the fix and the larger changes separately.


> Another approach might be changing GlobalVisTestFor(rel) instead to
> reflect the conditions in GetOldestNonRemovableTransactionId.

No, that'd not be correct, afaict.

Greetings,

Andres Freund




unnesting multirange data types

2021-06-09 Thread Jonathan S. Katz
Hi,

I have been exploring multirange data types using PostgreSQL 14 Beta 1.
Thus far I'm really happy with the user experience, and it has allowed
me to simplify some previously onerous queries!

I do have a question about trying to "unnest" a multirange type into its
individual ranges. For example, I have a query where I want to find the
availability over a given week. This query may look something like:

  SELECT datemultirange(daterange(CURRENT_DATE, CURRENT_DATE + 7))
- datemultirange(daterange(CURRENT_DATE + 2, CURRENT_DATE + 4))
as availability;

 availability
  ---
   {[2021-06-09,2021-06-11),[2021-06-13,2021-06-16)}
  (1 row)

I would like to decompose the returned multirange into its individual
ranges, similarly to how I would "unnest" an array:

  SELECT * FROM unnest(ARRAY[1,2,3]);
   unnest
  
1
2
3
  (3 rows)

So something like:

 SELECT unnest('{[2021-06-09,2021-06-11),
 [2021-06-13,2021-06-16)}')::datemultirange;

   unnest
  -
   [2021-06-09,2021-06-11)
   [2021-06-13,2021-06-16)
  (2 rows)

I looked at the various functions + operators available for the
multirange types in the documentation but could not find anything that
could perform this action.

Does this functionality exist?

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Character expansion with ICU collations

2021-06-09 Thread Tom Lane
Peter Eisentraut  writes:
> You can have these queries return both rows if you use an 
> accent-ignoring collation, like this example in the documentation:

> CREATE COLLATION ignore_accents (provider = icu, locale = 
> 'und-u-ks-level1-kc-true', deterministic = false);

It occurs to me to wonder whether texteq() still obeys transitivity
when using such a collation.

regards, tom lane




Re: Character expansion with ICU collations

2021-06-09 Thread Peter Eisentraut

On 09.06.21 17:31, Finnerty, Jim wrote:
CREATE COLLATION CI_AS (provider = icu, 
locale=’utf8@colStrength=secondary’, deterministic = false);


CREATE TABLE MyTable3
(

     ID INT IDENTITY(1, 1),
 Comments VARCHAR(100)

)

INSERT INTO MyTable3 (Comments) VALUES ('strasse')
INSERT INTO MyTable3 (Comments) VALUES ('straße')
SELECT * FROM MyTable3 WHERE Comments COLLATE CI_AS = 'strasse'
SELECT * FROM MyTable3 WHERE Comments COLLATE CI_AS = 'straße'

We would like to control whether each SELECT statement finds both 
records (because the sort key of ‘ß’ equals the sort key of ‘ss’), or 
whether each SELECT statement finds just one record.


You can have these queries return both rows if you use an 
accent-ignoring collation, like this example in the documentation:


CREATE COLLATION ignore_accents (provider = icu, locale = 
'und-u-ks-level1-kc-true', deterministic = false);





Re: Adjust pg_regress output for new long test names

2021-06-09 Thread Peter Eisentraut

On 09.06.21 03:57, Thomas Munro wrote:

test deadlock-simple  ... ok   20 ms
test deadlock-hard... ok10624 ms
test deadlock-soft... ok  147 ms
test deadlock-soft-2  ... ok 5154 ms
test deadlock-parallel... ok  132 ms
test detach-partition-concurrently-1 ... ok  553 ms
test detach-partition-concurrently-2 ... ok  234 ms
test detach-partition-concurrently-3 ... ok 2389 ms
test detach-partition-concurrently-4 ... ok 1876 ms

Any objections to making these new tests line up with the rest?


Can we scan all the test names first and then pick a suitable length?






Re: Adjust pg_regress output for new long test names

2021-06-09 Thread Peter Eisentraut

On 09.06.21 04:51, Noah Misch wrote:

On Wed, Jun 09, 2021 at 01:57:45PM +1200, Thomas Munro wrote:

test deadlock-simple  ... ok   20 ms
test deadlock-hard... ok10624 ms
test deadlock-soft... ok  147 ms
test deadlock-soft-2  ... ok 5154 ms
test deadlock-parallel... ok  132 ms
test detach-partition-concurrently-1 ... ok  553 ms
test detach-partition-concurrently-2 ... ok  234 ms
test detach-partition-concurrently-3 ... ok 2389 ms
test detach-partition-concurrently-4 ... ok 1876 ms
Make the test output visually consistent, as previously done by commit
14378245.

Not bad, but I would instead shorten the names to detach-[1234] or
detach-partition-[1234].  The marginal value of the second word is low, and
the third word helps even less.


DETACH CONCURRENTLY is a separate feature from plain DETACH.

But "partition" is surely redundant here.




Re: Add PortalDrop in exec_execute_message

2021-06-09 Thread Tom Lane
I wrote:
> I'm still wondering though why Yura is observing resources remaining
> held by an executed-to-completion Portal.  I think investigating that
> might be more useful than tinkering with pipeline mode.

I got a chance to look into this finally.  The lens I've been looking
at this through is "why are we still holding any buffer pins when
ExecutorRun finishes?".  Normal table scan nodes won't do that.

It turns out that the problem is specific to SELECT FOR UPDATE, and
it happens because nodeLockRows is not careful to shut down the
EvalPlanQual mechanism it uses before returning NULL at the end of
a scan.  If EPQ has been fired, it'll be holding a tuple slot
referencing whatever tuple it was last asked about.  The attached
trivial patch seems to take care of the issue nicely, while adding
little if any overhead.  (A repeat call to EvalPlanQualEnd doesn't
do much.)

regards, tom lane

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index b2e5c30079..7583973f4a 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -59,7 +59,11 @@ lnext:
 	slot = ExecProcNode(outerPlan);
 
 	if (TupIsNull(slot))
+	{
+		/* Release any resources held by EPQ mechanism before exiting */
+		EvalPlanQualEnd(>lr_epqstate);
 		return NULL;
+	}
 
 	/* We don't need EvalPlanQual unless we get updated tuple version(s) */
 	epq_needed = false;
@@ -381,6 +385,7 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
 void
 ExecEndLockRows(LockRowsState *node)
 {
+	/* We may have shut down EPQ already, but no harm in another call */
 	EvalPlanQualEnd(>lr_epqstate);
 	ExecEndNode(outerPlanState(node));
 }


Re: Race condition in recovery?

2021-06-09 Thread Robert Haas
On Wed, Jun 9, 2021 at 4:07 AM Dilip Kumar  wrote:
> Reason for the problem was that the "-Xnone" parameter was not
> accepted by "sub backup" in PostgresNode.pm so I created that for
> backpatch.  With attached patches I am to make it pass in v12,v11,v10
> (with fix) and fail (without fix).  However, we will have to make some
> change for 9.6 because pg_basebackup doesn't support -Xnone on 9.6,
> maybe we can delete the content from pg_wal after the backup, if we
> think that approach looks fine then I will make the changes for 9.6 as
> well.

Ah. I looked into this and found that this is because commit
081876d75ea15c3bd2ee5ba64a794fd8ea46d794 is new in master, so actually
that change is absent in all the back-branches. I have now back-ported
that portion of that commit to v13, v12, v11, and v10.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: when the startup process doesn't

2021-06-09 Thread Justin Pryzby
On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote:
> > +   {"log_min_duration_startup_process", PGC_SUSET, 
> > LOGGING_WHEN,
> >
> > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > Obviously it has no effect except during startup, but the change will be
> > effective if the current process crashes.
> > See also: 
> > https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com
> 
> I did not get exactly how it will change behaviour. In my
> understanding, when the server restarts after a crash, it fetches the
> value from the config file. So if there is any change that gets
> affected. Kindly correct me if I am wrong.

I don't think so.  I checked and SelectConfigFiles is called only once to read
config files and cmdline args.  And not called on restart_after_crash.

The GUC definitely isn't SUSET, since it's not useful to write in a (super)
user session SET log_min_duration_startup_process=123.

I've triple checked the behavior using a patch I submitted for Thomas' syncfs
feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up when
I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), then
a future crash uses syncfs.
https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com

-- 
Justin




Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-09 Thread Robert Haas
On Wed, Jun 9, 2021 at 2:43 AM Tom Lane  wrote:
> There are specific cases where there's a good reason to worry.
> For example, if we assume blindly that domain_in() is parallel
> safe, we will have cause to regret that.  But I don't find that
> to be a reason why we need to lock down everything everywhere.
> We need to understand the tradeoffs involved in what we check,
> and apply checks that are likely to avoid problems, while not
> being too nanny-ish.

Yeah, that's exactly how I feel about it, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-06-09 Thread Tom Lane
I wrote:
> ...  I'd be okay with dropping that test; or maybe we could
> fix things so that the default case succeeds?

Here's a draft patch that renames regress_ecpg_user2 to ecpg2_regression,
which matches the name of one of the databases used, allowing the test
cases with defaulted database name to succeed.  That gets rid of one of
the problematic diffs.  As it stood, though, that meant that connect/test5
wasn't exercising the connection-failure code path at all, which didn't
seem like what we want.  So I adjusted the second place that had been
failing to again fail on no-such-database, and stuck in gssencmode=disable
so that we shouldn't get any test diff on hamerkop.

regards, tom lane

diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile
index be53b7b94d..abea3fcc85 100644
--- a/src/interfaces/ecpg/test/Makefile
+++ b/src/interfaces/ecpg/test/Makefile
@@ -77,7 +77,7 @@ $(remaining_files_build): $(abs_builddir)/%: $(srcdir)/%
 endif
 
 # Common options for tests. Also pick up anything passed in EXTRA_REGRESS_OPTS
-REGRESS_OPTS = --dbname=ecpg1_regression,ecpg2_regression --create-role=regress_ecpg_user1,regress_ecpg_user2 $(EXTRA_REGRESS_OPTS)
+REGRESS_OPTS = --dbname=ecpg1_regression,ecpg2_regression --create-role=regress_ecpg_user1,ecpg2_regression $(EXTRA_REGRESS_OPTS)
 
 check: all
 	$(with_temp_install) ./pg_regress $(REGRESS_OPTS) --temp-instance=./tmp_check $(TEMP_CONF) --bindir= $(pg_regress_locale_flags) $(THREAD) --schedule=$(srcdir)/ecpg_schedule sql/twophase
diff --git a/src/interfaces/ecpg/test/connect/test1.pgc b/src/interfaces/ecpg/test/connect/test1.pgc
index 961bd72ef2..77e571ec86 100644
--- a/src/interfaces/ecpg/test/connect/test1.pgc
+++ b/src/interfaces/ecpg/test/connect/test1.pgc
@@ -26,16 +26,16 @@ exec sql end declare section;
 	exec sql connect to ecpg2_regression@localhost as main;
 	exec sql disconnect main;
 
-	exec sql connect to @localhost as main user regress_ecpg_user2;
+	exec sql connect to @localhost as main user ecpg2_regression;
 	exec sql disconnect main;
 
-	/* exec sql connect to :@TEMP_PORT@ as main user regress_ecpg_user2;
+	/* exec sql connect to :@TEMP_PORT@ as main user ecpg2_regression;
 	exec sql disconnect main; */
 
 	exec sql connect to tcp:postgresql://localhost/ecpg2_regression user regress_ecpg_user1 identified by connectpw;
 	exec sql disconnect;
 
-	exec sql connect to tcp:postgresql://localhost/ user regress_ecpg_user2;
+	exec sql connect to tcp:postgresql://localhost/ user ecpg2_regression;
 	exec sql disconnect;
 
 	strcpy(pw, "connectpw");
diff --git a/src/interfaces/ecpg/test/connect/test5.pgc b/src/interfaces/ecpg/test/connect/test5.pgc
index e712fa8778..96a7b3e892 100644
--- a/src/interfaces/ecpg/test/connect/test5.pgc
+++ b/src/interfaces/ecpg/test/connect/test5.pgc
@@ -21,7 +21,7 @@ exec sql end declare section;
 	ECPGdebug(1, stderr);
 
 	exec sql connect to ecpg2_regression as main;
-	exec sql alter user regress_ecpg_user2 ENCRYPTED PASSWORD 'insecure';
+	exec sql alter user ecpg2_regression ENCRYPTED PASSWORD 'insecure';
 	exec sql alter user regress_ecpg_user1 ENCRYPTED PASSWORD 'connectpw';
 	exec sql commit;
 	exec sql disconnect;  /* <-- "main" not specified */
@@ -40,7 +40,7 @@ exec sql end declare section;
 	exec sql connect to 'ecpg2_regression' as main;
 	exec sql disconnect main;
 
-	exec sql connect to as main user regress_ecpg_user2/insecure;
+	exec sql connect to as main user ecpg2_regression/insecure;
 	exec sql disconnect main;
 
 	exec sql connect to ecpg2_regression as main user regress_ecpg_user1/connectpw;
@@ -61,7 +61,7 @@ exec sql end declare section;
 	exec sql connect to "unix:postgresql://200.46.204.71/ecpg2_regression" as main user regress_ecpg_user1/connectpw;
 	exec sql disconnect main;
 
-	exec sql connect to unix:postgresql://localhost/ as main user regress_ecpg_user2 IDENTIFIED BY insecure;
+	exec sql connect to "unix:postgresql://localhost/no_such_db?gssencmode=disable" as main user ecpg2_regression IDENTIFIED BY insecure;
 	exec sql disconnect main;
 
 	/* connect twice */
diff --git a/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr b/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr
index c5b5248eb2..e5f16bd854 100644
--- a/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr
+++ b/src/interfaces/ecpg/test/expected/connect-test1-minGW32.stderr
@@ -14,30 +14,18 @@
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_finish: connection main closed
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ECPGconnect: opening database  on localhost port   for user regress_ecpg_user2
-[NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ECPGconnect: connection to server failed: FATAL:  database "regress_ecpg_user2" does not exist
+[NO_PID]: ECPGconnect: opening database  on localhost port   for user ecpg2_regression
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_finish: connection main closed
 [NO_PID]: sqlca: code: 

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-09 Thread Matthias van de Meent
On Wed, 9 Jun 2021 at 04:42, Michael Paquier  wrote:
>
> On Tue, Jun 08, 2021 at 05:47:28PM -0700, Peter Geoghegan wrote:
> > I don't have time to try this out myself today, but offhand I'm pretty
> > confident that this is sufficient to reproduce the underlying bug
> > itself. And if that's true then I guess it can't have anything to do
> > with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the
> > apparent similarity.
>
> Agreed.  It took me a couple of minutes to get autovacuum to run in an
> infinite loop with a standalone instance.  Nice catch, Justin!

I believe that I've found the culprit:
GetOldestNonRemovableTransactionId(rel) does not use the exact same
conditions for returning OldestXmin as GlobalVisTestFor(rel) does.
This results in different minimal XIDs, and subsequently this failure.

The attached patch fixes this inconsistency, and adds a set of asserts
to ensure that GetOldesNonRemovableTransactionId is equal to the
maybe_needed of the GlobalVisTest of that relation, plus some at
GlobalVisUpdateApply such that it will fail whenever it is called with
arguments that would move the horizons in the wrong direction. Note
that there was no problem in GlobalVisUpdateApply, but it helped me
determine that that part was not the source of the problem, and I
think that having this safeguard is a net-positive.

Another approach might be changing GlobalVisTestFor(rel) instead to
reflect the conditions in GetOldestNonRemovableTransactionId.

With attached prototype patch, I was unable to reproduce the
problematic case in 10 minutes. Without, I got the problematic
behaviour in seconds.

With regards,

Matthias
From fe5cb0430a06a44823d5b62f2f909da624505962 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Wed, 9 Jun 2021 17:26:34 +0200
Subject: [PATCH v1] Fix a bug in GetOldestNonRemovableTransactionId

GetOldestNonRemovableTransactionId(rel) did not return values consistent
with GlobalVisTestFor(rel). This is now updated, and some assertions are
added to ensure this problem case does not return.
---
 src/backend/storage/ipc/procarray.c | 39 +++--
 1 file changed, 32 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc..7177d50351 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1944,18 +1944,25 @@ TransactionId
 GetOldestNonRemovableTransactionId(Relation rel)
 {
 	ComputeXidHorizonsResult horizons;
+	TransactionId	result;
 
 	ComputeXidHorizons();
 
-	/* select horizon appropriate for relation */
-	if (rel == NULL || rel->rd_rel->relisshared)
-		return horizons.shared_oldest_nonremovable;
-	else if (RelationIsAccessibleInLogicalDecoding(rel))
-		return horizons.catalog_oldest_nonremovable;
+	/* select horizon appropriate for relation. Mirrored from GlobalVisTestFor */
+	if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
+		result = horizons.shared_oldest_nonremovable;
+	else if (IsCatalogRelation(rel) || RelationIsAccessibleInLogicalDecoding(rel))
+		result = horizons.catalog_oldest_nonremovable;
 	else if (RELATION_IS_LOCAL(rel))
-		return horizons.temp_oldest_nonremovable;
+		result = horizons.temp_oldest_nonremovable;
 	else
-		return horizons.data_oldest_nonremovable;
+		result = horizons.data_oldest_nonremovable;
+
+	/* Sanity check */
+	Assert(TransactionIdEquals(
+		XidFromFullTransactionId(GlobalVisTestFor(rel)->maybe_needed),
+			result));
+	return result;
 }
 
 /*
@@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state)
 static void
 GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
 {
+	/* assert non-decreasing nature of horizons */
+	Assert(FullTransactionIdFollowsOrEquals(
+			   FullXidRelativeTo(horizons->latest_completed,
+ horizons->shared_oldest_nonremovable),
+			   GlobalVisSharedRels.maybe_needed));
+	Assert(FullTransactionIdFollowsOrEquals(
+			   FullXidRelativeTo(horizons->latest_completed,
+ horizons->catalog_oldest_nonremovable),
+			   GlobalVisCatalogRels.maybe_needed));
+	Assert(FullTransactionIdFollowsOrEquals(
+			   FullXidRelativeTo(horizons->latest_completed,
+ horizons->data_oldest_nonremovable),
+			   GlobalVisDataRels.maybe_needed));
+	Assert(FullTransactionIdFollowsOrEquals(
+			   FullXidRelativeTo(horizons->latest_completed,
+ horizons->temp_oldest_nonremovable),
+			   GlobalVisTempRels.maybe_needed));
+
 	GlobalVisSharedRels.maybe_needed =
 		FullXidRelativeTo(horizons->latest_completed,
 		  horizons->shared_oldest_nonremovable);
-- 
2.20.1



Re: Decoding speculative insert with toast leaks memory

2021-06-09 Thread Alvaro Herrera
May I suggest to use a different name in the blurt_and_lock_123()
function, so that it doesn't conflict with the one in
insert-conflict-specconflict?  Thanks

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-06-09 Thread Bharath Rupireddy
On Wed, Jun 9, 2021 at 10:37 AM Peter Smith  wrote:
>
> On Wed, Jun 2, 2021 at 10:41 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jun 2, 2021 at 11:43 AM Peter Smith  wrote:
> > > Yes, it looks better, but (since the masks are all 1 bit) I was only
> > > asking why not do like:
> > >
> > > if (supported_opts & SUBOPT_CONNECT)
> > > if (supported_opts & SUBOPT_ENABLED)
> > > if (supported_opts & SUBOPT_SLOT_NAME)
> > > if (supported_opts & SUBOPT_COPY_DATA)
> >
> > Please review the attached v3 patch further.
>
> OK. I have applied the v3 patch and reviewed it again:
>
> - It applies OK.
> - The code builds OK.
> - The make check and TAP subscription tests are OK

Thanks.

> 1.
> +/*
> + * Structure to hold the bitmaps and values of all the options for
> + * CREATE/ALTER SUBSCRIPTION  commands.
> + */
>
> There seems to be an extra space before "commands."

Removed.

> 2.
> + /* If connect option is supported, the others also need to be. */
> + Assert(!IsSet(supported_opts, SUBOPT_CONNECT) ||
> +(IsSet(supported_opts, SUBOPT_ENABLED) &&
> + IsSet(supported_opts, SUBOPT_CREATE_SLOT) &&
> + IsSet(supported_opts, SUBOPT_COPY_DATA)));
>
> This comment about "the others" doesn’t make sense to me.
>
> e.g. Why only these 3 options? What about all those other SUBOPT_* options?

It is an existing Assert and comment for ensuring somebody doesn't
call parse_subscription_options with SUBOPT_CONNECT, without
SUBOPT_ENABLED, SUBOPT_CREATE_SLOT and SUBOPT_COPY_DATA. In other
words, when SUBOPT_CONNECT is passed in, the other three options
should also be passed. " the others" there in the comment makes sense
just by looking at the Assert statement.

> 3.
> I feel that this patch should be split into 2 parts
> a) the SubOpts changes, and
> b) the mutually exclusive options change.

Divided the patch into two.

> I agree that the new SubOpts struct etc. is an improvement over existing code.
>
> But, for the mutually exclusive options part I don't see what is
> gained by the new patch code. I preferred the old code with its
> multiple ereports. Although it was a bit repetitive IMO it was easier
> to read that way, and length-wise there is almost no difference. So if
> it is less readable and not a lot shorter then what is the benefit of
> the change?

I personally don't like the repeated code when there's a chance of
doing it better. It might not reduce the loc, but it removes the many
similar ereport(ERROR calls. PSA v4-0002 patch. I think the committer
can take a call on it.

> 4.
> - char*slotname;
> - bool slotname_given;
> - char*synchronous_commit;
> - bool binary_given;
> - bool binary;
> - bool streaming_given;
> - bool streaming;
> -
> - parse_subscription_options(stmt->options,
> -NULL, /* no "connect" */
> -NULL, NULL, /* no "enabled" */
> -NULL, /* no "create_slot" */
> -_given, ,
> -NULL, /* no "copy_data" */
> -_commit,
> -NULL, /* no "refresh" */
> -_given, ,
> -_given, );
> -
> - if (slotname_given)
> + SubOpts opts = {0};
>
> I feel it would be simpler to declare/init this "opts" variable just 1
> time at top of the function AlterSubscription, instead of the 6
> separate declarations in this v3 patch. Doing that can allow other
> code simplifications too. (see #5)

Done.

> 5.
>   case ALTER_SUBSCRIPTION_DROP_PUBLICATION:
>   {
>   bool isadd = stmt->kind == ALTER_SUBSCRIPTION_ADD_PUBLICATION;
> - bool copy_data;
> - bool refresh;
>   List*publist;
> + SubOpts opts = {0};
> +
> + opts.supported_opts |= SUBOPT_REFRESH;
> +
> + if (isadd)
> + opts.supported_opts |= SUBOPT_COPY_DATA;
>
> I think having a separate "isadd" variable is made moot now since
> adding the SubOpts struct.
>
> Instead you can do this:
> + if (stmt->kind == ALTER_SUBSCRIPTION_ADD_PUBLICATION)
> + opts.supported_opts |= SUBOPT_COPY_DATA;
>
> OR (after #4) you could do this:
>
> case ALTER_SUBSCRIPTION_ADD_PUBLICATION:
> opts.supported_opts |= SUBOPT_COPY_DATA;
> /* fall thru. */
> case ALTER_SUBSCRIPTION_DROP_PUBLICATION:

Done.

> 6.
> +
> +#define IsSet(val, option)  ((val & option) != 0)
> +
>
> Your IsSet macro might be better if changed to test *multiple* bits are all 
> set.
>
> Like this:
> #define IsSet(val, bits)  ((val & (bits)) == (bits))
>
> ~
>
> Most of the code remains the same, but some can be simplified.
> e.g.
> + /* If connect option is supported, the others also need to be. */
> + Assert(!IsSet(supported_opts, SUBOPT_CONNECT) ||
> +(IsSet(supported_opts, SUBOPT_ENABLED) &&
> + IsSet(supported_opts, SUBOPT_CREATE_SLOT) &&
> + IsSet(supported_opts, SUBOPT_COPY_DATA)));
>
> Becomes:
> Assert(!IsSet(supported_opts, SUBOPT_CONNECT) ||
>IsSet(supported_opts, SUBOPT_ENABLED|SUBOPT_CREATE_SLOT|SUBOPT_COPY_DATA));

Changed.

PSA v4 patch set.

With Regards,
Bharath Rupireddy.
From 57e8189a43bffa3b3464e8b878ed18b6c354a4a8 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 9 Jun 2021 07:37:41 -0700
Subject: [PATCH v4] Refactor 

Re: logical replication of truncate command with trigger causes Assert

2021-06-09 Thread Tom Lane
Mark Dilger  writes:
>> On Jun 9, 2021, at 7:52 AM, Tom Lane  wrote:
>> Somewhat unrelated, but ... am I reading the code correctly that
>> apply_handle_stream_start and related routines are using Asserts
>> to check that the remote sent stream-control messages in the correct
>> order?  That seems many degrees short of acceptable.

> Even if you weren't reading that correctly, this bit:

> xid = pq_getmsgint(s, 4);

> Assert(TransactionIdIsValid(xid));

> simply asserts that the sending server didn't send an invalid subtransaction 
> id.

Ugh, yeah.  We should never be using Asserts to validate incoming
messages -- a test-and-elog is more appropriate.

regards, tom lane




Re: logical replication of truncate command with trigger causes Assert

2021-06-09 Thread Mark Dilger



> On Jun 9, 2021, at 7:52 AM, Tom Lane  wrote:
> 
> Here's a draft patch for that.  I decided the most sensible way to
> organize this is to pair the existing ensure_transaction() subroutine
> with a cleanup subroutine.  Rather unimaginatively, perhaps, I renamed
> it to begin_transaction_step and named the cleanup end_transaction_step.
> (Better ideas welcome.)

Thanks!  The regression test I posted earlier passes with this patch applied.

> Somewhat unrelated, but ... am I reading the code correctly that
> apply_handle_stream_start and related routines are using Asserts
> to check that the remote sent stream-control messages in the correct
> order?  That seems many degrees short of acceptable.

Even if you weren't reading that correctly, this bit:

xid = pq_getmsgint(s, 4);

Assert(TransactionIdIsValid(xid));

simply asserts that the sending server didn't send an invalid subtransaction id.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: logical replication of truncate command with trigger causes Assert

2021-06-09 Thread Tom Lane
Amit Kapila  writes:
> On Wed, Jun 9, 2021 at 5:29 AM Tom Lane  wrote:
>> 2. Decide that we ought to ensure that a snapshot exists throughout
>> most of this code.  It's not entirely obvious to me that there is no
>> code path reachable from, say, apply_handle_truncate's collection of
>> relation OIDs that needs a snapshot.  If we went for that, I'd think
>> the right solution is to do PushActiveSnapshot right after each
>> ensure_transaction call, and then PopActiveSnapshot on the way out of
>> the respective subroutine.  We could then drop the snapshot management
>> calls that are currently associated with the executor state.

> +1 for the second option as with that, apart from what you said it
> will take off some load from future developers to decide which part of
> changes should be after acquiring snapshot.

Here's a draft patch for that.  I decided the most sensible way to
organize this is to pair the existing ensure_transaction() subroutine
with a cleanup subroutine.  Rather unimaginatively, perhaps, I renamed
it to begin_transaction_step and named the cleanup end_transaction_step.
(Better ideas welcome.)

As written, this'll result in creating and deleting a snapshot for some
stream-control messages that maybe don't need one; but the point here is
not to have to think too hard about whether they do, so that's OK with
me.  There are more CommandCounterIncrement calls than before, too,
but (a) those are cheap if there's nothing to do and (b) it's not real
clear to me that the extra calls are not necessary.

Somewhat unrelated, but ... am I reading the code correctly that
apply_handle_stream_start and related routines are using Asserts
to check that the remote sent stream-control messages in the correct
order?  That seems many degrees short of acceptable.

regards, tom lane

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 6ba447ea97..9e9b47ce4f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -282,30 +282,41 @@ should_apply_changes_for_rel(LogicalRepRelMapEntry *rel)
 }
 
 /*
- * Make sure that we started local transaction.
+ * Begin one step (one INSERT, UPDATE, etc) of a replication transaction.
  *
- * Also switches to ApplyMessageContext as necessary.
+ * Start a transaction, if this is the first step (else we keep using the
+ * existing transaction).
+ * Also provide a global snapshot and ensure we run in ApplyMessageContext.
  */
-static bool
-ensure_transaction(void)
+static void
+begin_transaction_step(void)
 {
-	if (IsTransactionState())
-	{
-		SetCurrentStatementStartTimestamp();
-
-		if (CurrentMemoryContext != ApplyMessageContext)
-			MemoryContextSwitchTo(ApplyMessageContext);
+	SetCurrentStatementStartTimestamp();
 
-		return false;
+	if (!IsTransactionState())
+	{
+		StartTransactionCommand();
+		maybe_reread_subscription();
 	}
 
-	SetCurrentStatementStartTimestamp();
-	StartTransactionCommand();
-
-	maybe_reread_subscription();
+	PushActiveSnapshot(GetTransactionSnapshot());
 
 	MemoryContextSwitchTo(ApplyMessageContext);
-	return true;
+}
+
+/*
+ * Finish up one step of a replication transaction.
+ * Callers of begin_transaction_step() must also call this.
+ *
+ * We don't close out the transaction here, but we should increment
+ * the command counter to make the effects of this step visible.
+ */
+static void
+end_transaction_step(void)
+{
+	PopActiveSnapshot();
+
+	CommandCounterIncrement();
 }
 
 /*
@@ -359,13 +370,6 @@ create_edata_for_relation(LogicalRepRelMapEntry *rel)
 	RangeTblEntry *rte;
 	ResultRelInfo *resultRelInfo;
 
-	/*
-	 * Input functions may need an active snapshot, as may AFTER triggers
-	 * invoked during finish_edata.  For safety, ensure an active snapshot
-	 * exists throughout all our usage of the executor.
-	 */
-	PushActiveSnapshot(GetTransactionSnapshot());
-
 	edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData));
 	edata->targetRel = rel;
 
@@ -433,8 +437,6 @@ finish_edata(ApplyExecutionData *edata)
 	ExecResetTupleTable(estate->es_tupleTable, false);
 	FreeExecutorState(estate);
 	pfree(edata);
-
-	PopActiveSnapshot();
 }
 
 /*
@@ -831,7 +833,7 @@ apply_handle_stream_start(StringInfo s)
 	 * transaction for handling the buffile, used for serializing the
 	 * streaming data and subxact info.
 	 */
-	ensure_transaction();
+	begin_transaction_step();
 
 	/* notify handle methods we're processing a remote transaction */
 	in_streamed_transaction = true;
@@ -861,6 +863,8 @@ apply_handle_stream_start(StringInfo s)
 		subxact_info_read(MyLogicalRepWorker->subid, stream_xid);
 
 	pgstat_report_activity(STATE_RUNNING, NULL);
+
+	end_transaction_step();
 }
 
 /*
@@ -937,7 +941,7 @@ apply_handle_stream_abort(StringInfo s)
 		StreamXidHash *ent;
 
 		subidx = -1;
-		ensure_transaction();
+		begin_transaction_step();
 		subxact_info_read(MyLogicalRepWorker->subid, xid);
 
 		for (i = subxact_data.nsubxacts; i > 0; i--)

Re: How to pass a parameter in a query to postgreSQL 11 (offtopic)

2021-06-09 Thread Justin Pryzby
On Wed, Jun 09, 2021 at 05:30:15AM -0500, Hassan Camacho Cadre wrote:
> I recently migrated from version 8.3 of postgreSQL to v11, previously in
> all my queries for passing parameters I used the character :
> Example
> Where id =: searched

I guess you migrated to a whole new environment, with many new package
versions, not just postgres ?

We don't know how you're issuing queries, but I'm guessing some other
application is what changed.

Postgres uses $1 for query parameters, in both v8.3 and in v11.
https://www.postgresql.org/docs/8.3/libpq-exec.html

BTW, this is the list for development of postgres itself.  It's much too busy
to also answer other questions.  Please raise the question on the -general
list, with information about your environment.
https://www.postgresql.org/list/

Thanks,
-- 
Justin

> In the new version when I try to make this query it sends me an error
> 
> ERROR syntax error at or near ":"
> 
> Could someone help me to know how I can configure the parameter passing
> character or, failing that, how I should pass the parameters in this new
> version.




Re: DELETE CASCADE

2021-06-09 Thread David Christensen
On Wed, Jun 9, 2021 at 8:48 AM David G. Johnston 
wrote:

> On Wednesday, June 9, 2021, Peter Eisentraut <
> peter.eisentr...@enterprisedb.com> wrote:
>
>>
>> It might work, I'm just saying it needs to be thought about carefully. If
>> you have functionality like, delete this if there is no matching record
>> over there, you need to have the permission to check that and need to make
>> sure it stays that way.
>>
>>
> Which I believe the presence of an existing foreign key does quite
> nicely.  Thus if the executing user is the table owner (group membership
> usually) and a FK already exists, the conditions for the cascade are
> fulfilled, including locking I would think, because that FK could have been
> defined to just do it without all this.  We are effectively just
> temporarily changing that aspect of the foreign key - the behavior should
> be identical to on cascade delete.
>

I think Peter is referring to the DELETE RESTRICT proposed mirror behavior
in this specific case, not DELETE CASCADE specifically.


>  I require convincing that there is a use case that requires laxer
> permissions.  Especially if we can solve the whole changing of the cascade
> option without having to drop the foreign key.
>

This was my original feeling as well, though really if I was going to run
this operation it would likely already be the database owner or superuser,
so my desire to make this work in all situations is tempered with my desire
to just have the basic functionality available at *some* level. :-)

David


Re: Fdw batch insert error out when set batch_size > 65535

2021-06-09 Thread Tomas Vondra

On 6/9/21 3:28 PM, Tom Lane wrote:

Tomas Vondra  writes:

Note that the problem here is [1] - we're creating a lot of slots
referencing the same tuple descriptor, which inflates the duration.
There's a fix in the other thread, which eliminates ~99% of the
overhead. I plan to push that fix soon (a day or two).


Oh, okay, as long as there's a plan to bring the time back down.



Yeah. Sorry for not mentioning this in the original message about the 
new regression test.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: RFC: Logging plan of the running query

2021-06-09 Thread Fujii Masao




On 2021/06/09 16:44, torikoshia wrote:

Updated the patch.


Thanks for updating the patch!

auto_explain can log the plan of even nested statement
if auto_explain.log_nested_statements is enabled.
But ISTM that pg_log_current_plan() cannot log that plan.
Is this intentional?
I think that it's better to make pg_log_current_plan() log
the plan of even nested statement.


+   es->format = EXPLAIN_FORMAT_TEXT;
+   es->settings = true;

Since pg_log_current_plan() is usually used to investigate and
trouble-shoot the long running queries, IMO it's better to
enable es->verbose by default and get additional information
about the queries. Thought?


+ * pg_log_current_plan
+ * Signal a backend process to log plan the of running query.

"plan the of" is typo?


+ * Only superusers are allowed to signal to log plan because any users to
+ * issue this request at an unbounded rate would cause lots of log messages
+ * and which can lead to denial of service.

"because any users" should be "because allowing any users"
like the comment for pg_log_backend_memory_contexts()?


+ * All the actual work is deferred to ProcessLogExplainInterrupt(),

"ProcessLogExplainInterrupt()" should be "ProcessLogCurrentPlanInterrupt()"?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Fix dropped object handling in pg_event_trigger_ddl_commands

2021-06-09 Thread Alvaro Herrera
On 2021-Jun-09, Michael Paquier wrote:

> On Mon, Jun 07, 2021 at 12:44:42PM +0300, Aleksander Alekseev wrote:
> > I confirm that the bug still exists in master (be57f216). The patch
> > fixes it and looks good to me. I changed the wording a little and
> > added a regression test. The updated patch is in the attachment. I
> > added this thread to the CF and updated the status to "Ready for
> > Committer".
> 
> FWIW, that looks rather natural to me to me to just ignore the object
> if it has already been dropped here.  The same kind of rules apply to
> tables dropped  with DROP TABLE which would not show up as of
> pg_event_trigger_ddl_commands(), but one can get a list as of
> pg_event_trigger_dropped_objects().

Oh, that parallel didn't occur to me.  I agree it seems a useful
precedent.

> Alvaro, were you planning to look at that?  I have not looked at the
> patch in details.  

I have it on my list of things to look at, but it isn't priority.  If
you to mess with it, please be my guest.

> missing_ok is available in getObjectIdentity() only
> since v14, so this cannot be backpatched :/

Ooh, yeah, I forgot about that.  And that one was pretty invasive ...

I'm not sure if we can reasonably implement a fix for older releases.
I mean, it's a relatively easy test: do a syscache search for the object
or a catalog indexscan (easy to do with get_object_property_data-based
API), and if the object is gone, skip getObjectTypeDescription and
getObjectIdentity.  But maybe this is too much code to add to stable
releases ...

-- 
Álvaro Herrera   Valdivia, Chile




Re: DELETE CASCADE

2021-06-09 Thread David G. Johnston
On Wednesday, June 9, 2021, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

>
> It might work, I'm just saying it needs to be thought about carefully. If
> you have functionality like, delete this if there is no matching record
> over there, you need to have the permission to check that and need to make
> sure it stays that way.
>
>
Which I believe the presence of an existing foreign key does quite nicely.
Thus if the executing user is the table owner (group membership usually)
and a FK already exists, the conditions for the cascade are fulfilled,
including locking I would think, because that FK could have been defined to
just do it without all this.  We are effectively just temporarily changing
that aspect of the foreign key - the behavior should be identical to on
cascade delete.

 I require convincing that there is a use case that requires laxer
permissions.  Especially if we can solve the whole changing of the cascade
option without having to drop the foreign key.

David J.


Re: Decoding speculative insert with toast leaks memory

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 4:22 PM Amit Kapila  wrote:

> On Wed, Jun 9, 2021 at 4:12 PM Dilip Kumar  wrote:
> >> Few comments:
> >> 1. The test has a lot of similarities and test duplication with what
> >> we are doing in insert-conflict-specconflict.spec. Can we move it to
> >> insert-conflict-specconflict.spec? I understand that having it in
> >> test_decoding has the advantage that we can have all decoding tests in
> >> one place but OTOH, we can avoid a lot of test-code duplication if we
> >> add it in insert-conflict-specconflict.spec.
> >>
> >
> > It seems the isolation test runs on the default configuration, will it
> be a good idea to change the wal_level to logical for the whole isolation
> tester folder?
> >
>
> No, that doesn't sound like a good idea to me. Let's keep it in
> test_decoding then.
>
>
Okay, I will work on the remaining comments and back patches and send it by
tomorrow.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Adjust pg_regress output for new long test names

2021-06-09 Thread Alvaro Herrera
On 2021-Jun-08, Noah Misch wrote:

> On Wed, Jun 09, 2021 at 03:21:36PM +1200, Thomas Munro wrote:
> > On Wed, Jun 9, 2021 at 2:44 PM Tom Lane  wrote:
> > > ... or we could shorten those file names.  I recall an episode
> > > awhile ago where somebody complained that their version of "tar"
> > > couldn't handle some of the path names in our tarball, so
> > > keeping things from getting to carpal-tunnel-inducing lengths
> > > does have its advantages.

Sure.  I'm also the author of tuplelock-upgrade-no-deadlock -- see
commit de87a084c0a5.  (Oleksii submitted it as "rowlock-upgrade-deadlock").
We could rename that one too while at it.

> > On Wed, Jun 9, 2021 at 2:51 PM Noah Misch  wrote:
> > > Not bad, but I would instead shorten the names to detach-[1234] or
> > > detach-partition-[1234].  The marginal value of the second word is low, 
> > > and
> > > the third word helps even less.
> 
> Better still, the numbers can change to something descriptive:
> 
> detach-1 => detach-visibility
> detach-2 => detach-fk-FOO
> detach-3 => detach-incomplete
> detach-4 => detach-fk-BAR
> 
> I don't grasp the difference between -2 and -4 enough to suggest concrete FOO
> and BAR words.

Looking at -2, it looks like a very small subset of -4.  I probably
wrote it first and failed to realize I could extend that one rather than
create -4.  We could just delete it.

We also have partition-concurrent-attach.spec; what if we make
everything a consistent set?  We could have

partition-attach
partition-detach-visibility (-1)
partition-detach-incomplete (-3)
partition-detach-fk (-4)

-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: Fdw batch insert error out when set batch_size > 65535

2021-06-09 Thread Tom Lane
Tomas Vondra  writes:
> Note that the problem here is [1] - we're creating a lot of slots 
> referencing the same tuple descriptor, which inflates the duration. 
> There's a fix in the other thread, which eliminates ~99% of the 
> overhead. I plan to push that fix soon (a day or two).

Oh, okay, as long as there's a plan to bring the time back down.

regards, tom lane




Re: Logical replication keepalive flood

2021-06-09 Thread Abbas Butt
Hi,

On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila  wrote:

> On Wed, Jun 9, 2021 at 1:47 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > > The issue - if actually it is - we send a keep-alive packet before a
> > > quite short sleep.
> > >
> > > We really want to send it if the sleep gets long but we cannot predict
> > > that before entering a sleep.
> > >
> > > Let me think a little more on this..
> >
> > After some investigation, I find out that the keepalives are sent
> > almost always after XLogSendLogical requests for the *next* record.
> >
>
> Does these keepalive messages are sent at the same frequency even for
> subscribers?


Yes, I have tested it with one publisher and one subscriber.
The moment I start pgbench session I can see keepalive messages sent and
replied by the subscriber with same frequency.


> Basically, I wanted to check if we have logical
> replication set up between 2 nodes then do we send these keep-alive
> messages flood?


Yes we do.


> If not, then why is it different in the case of
> pg_recvlogical?


Nothing, the WAL sender behaviour is same in both cases.


> Is it possible that the write/flush location is not
> updated at the pace at which we expect?


Well, it is async replication. The receiver can choose to update LSNs at
its own will, say after 10 mins interval.
It should only impact the size of WAL retained by the server.

Please see commit 41d5f8ad73
> which seems to be talking about a similar problem.
>

That commit does not address this problem.


>
> --
> With Regards,
> Amit Kapila.
>


-- 
-- 
*Abbas*
Senior Architect


Ph: 92.334.5100153
Skype ID: gabbasb
edbpostgres.com

*Follow us on Twitter*
@EnterpriseDB


Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 4:50 PM Amit Kapila  wrote:
>
> On Wed, Jun 9, 2021 at 1:53 AM Jeff Davis  wrote:
> >
> > On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > > Here's an updated patchset that adds back in the option for two-phase
> > > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > > support for
> > > two-phase decoding in pg_recvlogical.
> >
> > A few things:
> >
> > * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
> >
>
> Isn't it better to add it after LOGICAL IDENT? In docs [1], we expect
> that way. Also, see code in libpqrcv_create_slot where we expect them
> to be together but we can change that if you still prefer to add it
> after LOGICAL. BTW, can't we consider it to be part of
> create_slot_opt_list?
>
> Also, it might be good if we can add a test in
> src/bin/pg_basebackup/t/030_pg_recvlogical
>

Some more points:
1. pg_recvlogical can only send two_phase option if
(PQserverVersion(conn) >= 14), otherwise, it won't work for older
versions of the server.
2. In the main patch [1], we do send two_phase option even during
START_REPLICATION for the very first time when the two_phase can be
enabled. There are reasons as described in the worker.c why we can't
enable it during CREATE_REPLICATION_SLOT. Now, if we want to do
protocol changes, I wonder why only do some changes and leave the rest
for the next version?

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPt7wnctZpfhaLyuPA0mXDAtuw7DsMUfw3TePJLxqTArjA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Case expression pushdown

2021-06-09 Thread Alexander Pyhalov

Hi.

This patch allows pushing case expressions to foreign servers, so that 
more types of updates could be executed directly.


For example, without patch:

EXPLAIN (VERBOSE, COSTS OFF)
UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END
WHERE c1 > 1000;
  QUERY PLAN
---
  Update on public.ft2 d
   Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2 WHERE ctid = $1
   ->  Foreign Scan on public.ft2 d
 Output: CASE WHEN (c2 > 0) THEN c2 ELSE 0 END, ctid, d.*
 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM 
"S 1"."T 1" WHERE (("C 1" > 1000)) FOR UPDATE



EXPLAIN (VERBOSE, COSTS OFF)
UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END
WHERE c1 > 1000;
   QUERY PLAN

 Update on public.ft2 d
   ->  Foreign Update on public.ft2 d
 Remote SQL: UPDATE "S 1"."T 1" SET c2 = (CASE WHEN (c2 > 0) 
THEN c2 ELSE 0 END) WHERE (("C 1" > 1000))



--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 19202bfa5ba8a7eadf1a3b0ce869106e967e0dd2 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Tue, 30 Mar 2021 13:24:14 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 124 ++
 .../postgres_fdw/expected/postgres_fdw.out|  28 
 contrib/postgres_fdw/sql/postgres_fdw.sql |  16 +++
 3 files changed, 168 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..4e8162c045c 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt
 {
 	Oid			collation;		/* OID of current collation, if any */
 	FDWCollateState state;		/* state of current collation choice */
+	List  *case_args;   /* list of case args to inspect */
 } foreign_loc_cxt;
 
 /*
@@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	  *case_args;	/* list of args to deparse CaseTestExpr */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
@@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
+	loc_cxt.case_args = NIL;
 	if (!foreign_expr_walker((Node *) expr, _cxt, _cxt))
 		return false;
 
@@ -312,6 +318,7 @@ foreign_expr_walker(Node *node,
 	/* Set up inner_cxt for possible recursion to child nodes */
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
+	inner_cxt.case_args = outer_cxt->case_args;
 
 	switch (nodeTag(node))
 	{
@@ -509,6 +516,69 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *ce = (CaseExpr *) node;
+ListCell   *arg;
+
+if (ce->arg)
+{
+	inner_cxt.case_args = lappend(inner_cxt.case_args, ce->arg);
+}
+
+foreach(arg, ce->args)
+{
+	CaseWhen   *w = lfirst_node(CaseWhen, arg);
+
+	if (!foreign_expr_walker((Node *) w->expr,
+		glob_cxt, _cxt))
+		return false;
+
+	if (!foreign_expr_walker((Node *) w->result,
+		glob_cxt, _cxt))
+		return false;
+}
+
+if (!foreign_expr_walker((Node *) ce->defresult,
+		 glob_cxt, _cxt))
+	return false;
+
+if (ce->arg)
+{
+	inner_cxt.case_args = list_delete_last(inner_cxt.case_args);
+	outer_cxt->case_args = inner_cxt.case_args;
+}
+
+collation = ce->casecollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+Expr *arg;
+
+Assert(outer_cxt->case_args != NIL);
+arg = llast(outer_cxt->case_args);
+
+if (!foreign_expr_walker((Node *) arg,
+glob_cxt, _cxt))
+	return false;
+
+/*
+ * Collation and state just bubble up from the previously saved case argument
+ */
+collation = inner_cxt.collation;
+state =  

Re: when the startup process doesn't

2021-06-09 Thread Nitin Jadhav
> Should it show the rusage ?  It's shown at startup completion since 10a5b35a0,
> so it seems strange not to show it here.

> I don't know, that seems like it's going to make the messages awfully
> long, and I'm not sure of what use it is to see that for every report.

I have not changed anything wrt this. If it is really required to
change, then I will change.

> +   log_startup_process_progress("Syncing data 
> directory", path, false);

> I think the fsync vs syncfs paths should be distinguished: "Syncing data
> directory (fsync)" vs "Syncing data directory (syncfs)".

Fixed.

> +   {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,
>
> I think it should be PGC_SIGHUP, to allow changing it during runtime.
> Obviously it has no effect except during startup, but the change will be
> effective if the current process crashes.
> See also: 
> https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com

I did not get exactly how it will change behaviour. In my
understanding, when the server restarts after a crash, it fetches the
value from the config file. So if there is any change that gets
affected. Kindly correct me if I am wrong.

> +extern void log_startup_process_progress(char *operation, void *data,
> + bool is_complete);
>
> I think this should take an enum operation, rather than using strcmp() on it
> later.  The enum values might be RECOVERY_START, RECOVERY_END, rather than
> having a bool is_complete.

Fixed.

> I don't like the name very much. log_min_duration_startup_process
> seems to have been chosen to correspond to log_min_duration_statement,
> but the semantics are different. That one is a threshold, whereas this
> one is an interval. Maybe something like
> log_startup_progress_interval?

Yes. This looks more appropriate. Fixed in the attached patch.

> As far as the patch itself goes, I think that the overhead of this
> approach is going to be unacceptably high. I was imagining having a
> timer running in the background that fires periodically, with the
> interval handler just setting a flag. Then in the foreground we just
> need to check whether the flag is set. I doubt that we can get away
> with a GetCurrentTimestamp() after applying every WAL record ... that
> seems like it will be slow.

Thanks for correcting me. This approach is far better than what I had
used earlier. I have done the code changes as per your approach in the
attached patch.

Kindly let me know if any changes are required.

Thanks & Regards,
Nitin Jadhav

On Mon, Jun 7, 2021 at 7:12 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > ... I doubt that we can get away
> > with a GetCurrentTimestamp() after applying every WAL record ... that
> > seems like it will be slow.
>
> Yeah, that's going to be pretty awful even on machines with fast
> gettimeofday, never mind ones where it isn't.
>
> It should be possible to use utils/misc/timeout.c to manage the
> interrupt, I'd think.
>
> regards, tom lane


v2_startup_process_progress.patch
Description: Binary data


Re: postgres_fdw batching vs. (re)creating the tuple slots

2021-06-09 Thread Bharath Rupireddy
On Wed, Jun 9, 2021 at 4:38 PM Tomas Vondra
 wrote:
>
> On 6/9/21 12:50 PM, Bharath Rupireddy wrote:
> > On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
> >  wrote:
> >>
> >> Hi,
> >>
> >> Here's a v2 fixing a silly bug with reusing the same variable in two
> >> nested loops (worked for simple postgres_fdw cases, but "make check"
> >> failed).
> >
> > I applied these patches and ran make check in postgres_fdw contrib
> > module, I saw a server crash. Is it the same failure you were saying
> > above?
> >
>
> Nope, that was causing infinite loop. This is jut a silly mistake on my
> side - I forgot to replace the i/j variable inside the loop. Here's v3.

Thanks. The postgres_fdw regression test execution time is not
increased too much with the patches even with the test case added by
the below commit. With and without the patches attached in this
thread, the execution times are 5 sec and 17 sec respectively. So,
essentially these patches are reducing the execution time for the test
case added by the below commit.

commit cb92703384e2bb3fa0a690e5dbb95ad333c2b44c
Author: Tomas Vondra 
Date:   Tue Jun 8 20:22:18 2021 +0200

Adjust batch size in postgres_fdw to not use too many parameters

With Regards,
Bharath Rupireddy.




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 1:53 AM Jeff Davis  wrote:
>
> On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > Here's an updated patchset that adds back in the option for two-phase
> > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > support for
> > two-phase decoding in pg_recvlogical.
>
> A few things:
>
> * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
>

Isn't it better to add it after LOGICAL IDENT? In docs [1], we expect
that way. Also, see code in libpqrcv_create_slot where we expect them
to be together but we can change that if you still prefer to add it
after LOGICAL. BTW, can't we consider it to be part of
create_slot_opt_list?

Also, it might be good if we can add a test in
src/bin/pg_basebackup/t/030_pg_recvlogical

[1] - https://www.postgresql.org/docs/devel/logicaldecoding-walsender.html

-- 
With Regards,
Amit Kapila.




Re: postgres_fdw batching vs. (re)creating the tuple slots

2021-06-09 Thread Tomas Vondra



On 6/9/21 12:50 PM, Bharath Rupireddy wrote:

On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
 wrote:


Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two
nested loops (worked for simple postgres_fdw cases, but "make check"
failed).


I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?



Nope, that was causing infinite loop. This is jut a silly mistake on my 
side - I forgot to replace the i/j variable inside the loop. Here's v3.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 494018fd3f2b983be474a85fc12fe3a4dbefa76b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 4 Jun 2021 12:45:18 +0200
Subject: [PATCH 1/2] create copy of a descriptor for batching

---
 src/backend/executor/nodeModifyTable.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 379b056310..c287a371a1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,6 +678,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
+			TupleDesc tdesc;
+
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -703,13 +705,16 @@ ExecInsert(ModifyTableState *mtstate,
 	 resultRelInfo->ri_BatchSize);
 			}
 
+			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+
 			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+MakeSingleTupleTableSlot(tdesc,
 		 slot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
 		 slot);
+
 			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+MakeSingleTupleTableSlot(tdesc,
 		 planSlot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
 		 planSlot);
-- 
2.31.1

>From 9290d7a90d9c93521ae531768055b5c567fcdc51 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 4 Jun 2021 12:59:47 +0200
Subject: [PATCH 2/2] initialize slots only once for batching

---
 src/backend/executor/nodeModifyTable.c | 43 ++
 src/include/nodes/execnodes.h  |  1 +
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c287a371a1..91971ab041 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,8 +678,6 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
-			TupleDesc tdesc;
-
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -705,19 +703,25 @@ ExecInsert(ModifyTableState *mtstate,
 	 resultRelInfo->ri_BatchSize);
 			}
 
-			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+			/* initialize the slot, if not already done */
+			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_BatchInitialized)
+			{
+TupleDesc tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
 
-			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(tdesc,
-		 slot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
-		 slot);
+resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+	MakeSingleTupleTableSlot(tdesc,
+			 slot->tts_ops);
+ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+			 slot);
 
-			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(tdesc,
-		 planSlot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
-		 planSlot);
+resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+	MakeSingleTupleTableSlot(tdesc,
+			 planSlot->tts_ops);
+ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+			 planSlot);
+
+resultRelInfo->ri_BatchInitialized++;
+			}
 
 			resultRelInfo->ri_NumSlots++;
 
@@ -1039,12 +1043,6 @@ ExecBatchInsert(ModifyTableState *mtstate,
 
 	if (canSetTag && numInserted > 0)
 		estate->es_processed += numInserted;
-
-	for (i = 0; i < numSlots; i++)
-	{
-		ExecDropSingleTupleTableSlot(slots[i]);
-		ExecDropSingleTupleTableSlot(planSlots[i]);
-	}
 }
 
 /* 
@@ -3167,6 +3165,7 @@ ExecEndModifyTable(ModifyTableState *node)
 	 */
 	for (i = 0; i < node->mt_nrels; i++)
 	{
+		int j;
 		ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
 
 		if (!resultRelInfo->ri_usesFdwDirectModify &&
@@ -3174,6 +3173,12 

Re: speed up verifying UTF-8

2021-06-09 Thread Heikki Linnakangas

On 07/06/2021 15:39, John Naylor wrote:
On Mon, Jun 7, 2021 at 8:24 AM Heikki Linnakangas > wrote:

 >
 > On 03/06/2021 21:58, John Naylor wrote:
 > > The microbenchmark is the same one you attached to [1], which I 
extended

 > > with a 95% multibyte case.
 >
 > Could you share the exact test you're using? I'd like to test this on my
 > old raspberry pi, out of curiosity.

Sure, attached.

--
John Naylor
EDB: http://www.enterprisedb.com 


Results from chipmunk, my first generation Raspberry Pi:

Master:

 chinese | mixed | ascii
-+---+---
   25392 | 16287 | 10295
(1 row)

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch:

 chinese | mixed | ascii
-+---+---
   17739 | 10854 |  4121
(1 row)

So that's good.

What is the worst case scenario for this algorithm? Something where the 
new fast ASCII check never helps, but is as fast as possible with the 
old code. For that, I added a repeating pattern of '123456789012345ä' to 
the test set (these results are from my Intel laptop, not the raspberry pi):


Master:

 chinese | mixed | ascii | mixed2
-+---+---+
1333 |   757 |   410 |573
(1 row)

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch:

 chinese | mixed | ascii | mixed2
-+---+---+
 942 |   470 |66 |   1249
(1 row)

So there's a regression with that input. Maybe that's acceptable, this 
is the worst case, after all. Or you could tweak check_ascii for a 
different performance tradeoff, by checking the two 64-bit words 
separately and returning "8" if the failure happens in the second word. 
And I haven't tried the SSE patch yet, maybe that compensates for this.


- Heikki




Re: Decoding speculative insert with toast leaks memory

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 4:12 PM Dilip Kumar  wrote:
>
> On Wed, Jun 9, 2021 at 11:03 AM Amit Kapila  wrote:
>>
>> On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar  wrote:
>> >
>> > Based on the off list discussion, I have modified the test based on
>> > the idea showed in
>> > "isolation/specs/insert-conflict-specconflict.spec", other open point
>> > we had about the race condition that how to ensure that when we unlock
>> > any session it make progress, IMHO the isolation tested is designed in
>> > a way that either all the waiting session returns with the output or
>> > again block on a heavy weight lock before performing the next step.
>> >
>>
>> Few comments:
>> 1. The test has a lot of similarities and test duplication with what
>> we are doing in insert-conflict-specconflict.spec. Can we move it to
>> insert-conflict-specconflict.spec? I understand that having it in
>> test_decoding has the advantage that we can have all decoding tests in
>> one place but OTOH, we can avoid a lot of test-code duplication if we
>> add it in insert-conflict-specconflict.spec.
>>
>
> It seems the isolation test runs on the default configuration, will it be a 
> good idea to change the wal_level to logical for the whole isolation tester 
> folder?
>

No, that doesn't sound like a good idea to me. Let's keep it in
test_decoding then.

-- 
With Regards,
Amit Kapila.




Re: postgres_fdw batching vs. (re)creating the tuple slots

2021-06-09 Thread Bharath Rupireddy
On Wed, Jun 9, 2021 at 4:00 PM Tomas Vondra
 wrote:
>
> Hi,
>
> Here's a v2 fixing a silly bug with reusing the same variable in two
> nested loops (worked for simple postgres_fdw cases, but "make check"
> failed).

I applied these patches and ran make check in postgres_fdw contrib
module, I saw a server crash. Is it the same failure you were saying
above?

With Regards,
Bharath Rupireddy.




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-09 Thread Ajin Cherian
On Wed, Jun 9, 2021 at 6:23 AM Jeff Davis  wrote:
>
> On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > Here's an updated patchset that adds back in the option for two-phase
> > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > support for
> > two-phase decoding in pg_recvlogical.
>
> A few things:
>
> * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
> * Document the TWO_PHASE keyword in doc/src/sgml/protocol.sgml
> * Cross check that --two-phase is specified only if --create-slot is
> specified
> * Maybe an Assert(!(two_phase && is_physical)) in
> CreateReplicationSlot()?
>
> Other than that, it looks good, and it works as I expect it to.


Updated. Do have a look.

thanks,
Ajin Cherian
Fujitsu Australia


v3-0002-Add-support-for-two-phase-decoding-in-pg_recvlogi.patch
Description: Binary data


v3-0001-Add-option-to-set-two-phase-in-CREATE_REPLICATION.patch
Description: Binary data


Re: Decoding speculative insert with toast leaks memory

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 11:03 AM Amit Kapila  wrote:

> On Tue, Jun 8, 2021 at 5:16 PM Dilip Kumar  wrote:
> >
> > Based on the off list discussion, I have modified the test based on
> > the idea showed in
> > "isolation/specs/insert-conflict-specconflict.spec", other open point
> > we had about the race condition that how to ensure that when we unlock
> > any session it make progress, IMHO the isolation tested is designed in
> > a way that either all the waiting session returns with the output or
> > again block on a heavy weight lock before performing the next step.
> >
>
> Few comments:
> 1. The test has a lot of similarities and test duplication with what
> we are doing in insert-conflict-specconflict.spec. Can we move it to
> insert-conflict-specconflict.spec? I understand that having it in
> test_decoding has the advantage that we can have all decoding tests in
> one place but OTOH, we can avoid a lot of test-code duplication if we
> add it in insert-conflict-specconflict.spec.
>
>
It seems the isolation test runs on the default configuration, will it be a
good idea to change the wal_level to logical for the whole isolation tester
folder?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


How to pass a parameter in a query to postgreSQL 11

2021-06-09 Thread Hassan Camacho Cadre
Hello

I recently migrated from version 8.3 of postgreSQL to v11, previously in
all my queries for passing parameters I used the character :

Example
Where id =: searched

In the new version when I try to make this query it sends me an error

ERROR syntax error at or near ":"

Could someone help me to know how I can configure the parameter passing
character or, failing that, how I should pass the parameters in this new
version.

Greetings

-- 
Atentamente Msc. Hassan Camacho.


Re: postgres_fdw batching vs. (re)creating the tuple slots

2021-06-09 Thread Tomas Vondra

Hi,

Here's a v2 fixing a silly bug with reusing the same variable in two 
nested loops (worked for simple postgres_fdw cases, but "make check" 
failed).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 494018fd3f2b983be474a85fc12fe3a4dbefa76b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 4 Jun 2021 12:45:18 +0200
Subject: [PATCH 1/2] create copy of a descriptor for batching

---
 src/backend/executor/nodeModifyTable.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 379b056310..c287a371a1 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,6 +678,8 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
+			TupleDesc tdesc;
+
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -703,13 +705,16 @@ ExecInsert(ModifyTableState *mtstate,
 	 resultRelInfo->ri_BatchSize);
 			}
 
+			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+
 			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(slot->tts_tupleDescriptor,
+MakeSingleTupleTableSlot(tdesc,
 		 slot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
 		 slot);
+
 			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
+MakeSingleTupleTableSlot(tdesc,
 		 planSlot->tts_ops);
 			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
 		 planSlot);
-- 
2.31.1

>From 22a7a2c295ccce850da5f02d6239975057345b32 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 4 Jun 2021 12:59:47 +0200
Subject: [PATCH 2/2] initialize slots only once for batching

---
 src/backend/executor/nodeModifyTable.c | 43 ++
 src/include/nodes/execnodes.h  |  1 +
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index c287a371a1..81b3b522c2 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,8 +678,6 @@ ExecInsert(ModifyTableState *mtstate,
 		 */
 		if (resultRelInfo->ri_BatchSize > 1)
 		{
-			TupleDesc tdesc;
-
 			/*
 			 * If a certain number of tuples have already been accumulated, or
 			 * a tuple has come for a different relation than that for the
@@ -705,19 +703,25 @@ ExecInsert(ModifyTableState *mtstate,
 	 resultRelInfo->ri_BatchSize);
 			}
 
-			tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
+			/* initialize the slot, if not already done */
+			if (resultRelInfo->ri_NumSlots >= resultRelInfo->ri_BatchInitialized)
+			{
+TupleDesc tdesc = CreateTupleDescCopy(slot->tts_tupleDescriptor);
 
-			resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(tdesc,
-		 slot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
-		 slot);
+resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
+	MakeSingleTupleTableSlot(tdesc,
+			 slot->tts_ops);
+ExecCopySlot(resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots],
+			 slot);
 
-			resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
-MakeSingleTupleTableSlot(tdesc,
-		 planSlot->tts_ops);
-			ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
-		 planSlot);
+resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
+	MakeSingleTupleTableSlot(tdesc,
+			 planSlot->tts_ops);
+ExecCopySlot(resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots],
+			 planSlot);
+
+resultRelInfo->ri_BatchInitialized++;
+			}
 
 			resultRelInfo->ri_NumSlots++;
 
@@ -1039,12 +1043,6 @@ ExecBatchInsert(ModifyTableState *mtstate,
 
 	if (canSetTag && numInserted > 0)
 		estate->es_processed += numInserted;
-
-	for (i = 0; i < numSlots; i++)
-	{
-		ExecDropSingleTupleTableSlot(slots[i]);
-		ExecDropSingleTupleTableSlot(planSlots[i]);
-	}
 }
 
 /* 
@@ -3167,6 +3165,7 @@ ExecEndModifyTable(ModifyTableState *node)
 	 */
 	for (i = 0; i < node->mt_nrels; i++)
 	{
+		int j;
 		ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
 
 		if (!resultRelInfo->ri_usesFdwDirectModify &&
@@ -3174,6 +3173,12 @@ ExecEndModifyTable(ModifyTableState *node)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify != NULL)
 			resultRelInfo->ri_FdwRoutine->EndForeignModify(node->ps.state,
 		   resultRelInfo);
+
+		for (j = 0; j < resultRelInfo->ri_NumSlots; j++)
+		{
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_Slots[i]);
+			ExecDropSingleTupleTableSlot(resultRelInfo->ri_PlanSlots[i]);

Re: Fdw batch insert error out when set batch_size > 65535

2021-06-09 Thread Tomas Vondra




On 6/9/21 12:22 PM, Tomas Vondra wrote:



On 6/9/21 8:28 AM, Tom Lane wrote:

I wrote:

Bharath Rupireddy  writes:
I've added a simple regression test to postgres_fdw, testing that 
batch

sizes > 65535 work fine, and pushed the fix.



I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.



The cost, versus the odds of ever detecting a problem, doesn't
seem like a good tradeoff.


I took a quick look and noted that on buildfarm member longfin
(to take a random example that's sitting a few feet from me),
the time for contrib-install-check went from 34 seconds before
this patch to 40 seconds after.  I find that completely
unacceptable compared to the likely value of this test case.



Note that the problem here is [1] - we're creating a lot of slots 
referencing the same tuple descriptor, which inflates the duration. 
There's a fix in the other thread, which eliminates ~99% of the 
overhead. I plan to push that fix soon (a day or two).




Forgot to add the link:

[1] 
https://www.postgresql.org/message-id/ebbbcc7d-4286-8c28-0272-61b4753af761%40enterprisedb.com



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Fdw batch insert error out when set batch_size > 65535

2021-06-09 Thread Tomas Vondra




On 6/9/21 8:28 AM, Tom Lane wrote:

I wrote:

Bharath Rupireddy  writes:

I've added a simple regression test to postgres_fdw, testing that batch
sizes > 65535 work fine, and pushed the fix.



I was earlier thinking of adding one, but stopped because it might
increase the regression test execution time. It looks like that's true
- with and without the test case it takes 17 sec and 4 sec
respectively on my dev system which is 4X slower. I'm not sure if this
is okay.



The cost, versus the odds of ever detecting a problem, doesn't
seem like a good tradeoff.


I took a quick look and noted that on buildfarm member longfin
(to take a random example that's sitting a few feet from me),
the time for contrib-install-check went from 34 seconds before
this patch to 40 seconds after.  I find that completely
unacceptable compared to the likely value of this test case.



Note that the problem here is [1] - we're creating a lot of slots 
referencing the same tuple descriptor, which inflates the duration. 
There's a fix in the other thread, which eliminates ~99% of the 
overhead. I plan to push that fix soon (a day or two).



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: locking [user] catalog tables vs 2pc vs logical rep

2021-06-09 Thread osumi.takami...@fujitsu.com
On Tuesday, June 8, 2021 5:04 PM I wrote:
> On Monday, June 7, 2021 6:22 PM Amit Kapila 
> wrote:
> > On Mon, Jun 7, 2021 at 10:44 AM Amit Kapila 
> > wrote:
> > >
> > > One more comment is that for HEAD, first just create a patch with
> > > synchronous replication-related doc changes and then write a
> > > separate patch for prepared transactions.
> >
> > I noticed that docs for "Synchronous replication support for Logical 
> > Decoding"
> > has been introduced by commit
> > 49c0864d7ef5227faa24f903902db90e5c9d5d69 which goes till 9.6. So, I
> > think you need to create a patch for 9.6 as well unless one of the
> > existing patches already applies in 9.6.
> OK. I could apply PG10's patch to 9.6.
> Also, I've made a separate patch for 2PC description.
> 
> On the other hand, I need to mention that there are some gaps to cause 
> failures
> to apply patches between supported versions.
> (e.g. applying a patch for HEAD to stable PG13 fails)
I scrutinized this POV and checked the gaps between supported versions.
In terms of the section where the patch want to fix,
there are only 2 major gaps between PG10 and PG11 - [1]
and between PG13 and HEAD - [2]. In other words,
the patch-set should be 4 types.

* patch for HEAD
* additional patch for HEAD based on 2PC patch-set
* patch for from PG11 to PG13
* patch for PG9.6 and PG10

> To address the gaps between the versions, I needed to conduct some manual
> fixes.
> Therefore, please note that the content of patch between PG12 and PG13 are
> almost same like PG9.6 and PG10, but, I prepared independent patches for
> HEAD and PG11, in order to make those applied in a comfortable manner.
Therefore, I was wrong.
I didn't need the specific independent patch for PG11.
I'll fix the patch-set accordingly in the next version.


[1] how we finish xref tag is different between PG10 and PG11

--- logicaldecoding.sgml_PG11   2021-06-09 04:38:18.214163527 +
+++ logicaldecoding.sgml_PG10   2021-06-09 04:37:50.533163527 +
@@ -730,9 +698,9 @@
 replication solutions with the same user interface as synchronous
 replication for streaming
 replication.  To do this, the streaming replication interface
-(see ) must be used to stream 
out
+(see ) must be used to stream out
 data. Clients have to send Standby status update (F)
-(see ) messages, just like streaming
+(see ) messages, just like streaming
 replication clients do.


[2] in HEAD, we have a new sect1 after "Synchronous Replication Support for 
Logical Decoding"

--- logicaldecoding.sgml_PG13   2021-06-09 05:10:34.927163527 +
+++ logicaldecoding.sgml_HEAD   2021-06-09 05:08:12.810163527 +
@@ -747,4 +1089,177 @@
  

   
+
+  
+   Streaming of Large Transactions for Logical Decoding
+
+   
+The basic output plugin callbacks (e.g., begin_cb,
...


Best Regards,
Takamichi Osumi



Re: Race condition in recovery?

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 1:37 PM Dilip Kumar  wrote:
>
> On Wed, Jun 9, 2021 at 12:14 PM Dilip Kumar  wrote:
> >
> > On Wed, Jun 9, 2021 at 2:07 AM Robert Haas  wrote:
> > 2021-06-09 12:11:08.618 IST [122456] LOG:  entering standby mode
> > 2021-06-09 12:11:08.622 IST [122456] LOG:  restored log file 
> > "0002.history" from archive
> > cp: cannot stat 
> > ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010002’:
> >  No such file or directory
> > 2021-06-09 12:11:08.627 IST [122456] LOG:  redo starts at 0/228
> > 2021-06-09 12:11:08.627 IST [122456] LOG:  consistent recovery state 
> > reached at 0/300
> >
> > Next, I will investigate, without a fix on v11 (maybe v12, v10..) why it is 
> > not hitting the defect location at all.  And after that, I will check the 
> > status on other older versions.
>
> Reason for the problem was that the "-Xnone" parameter was not
> accepted by "sub backup" in PostgresNode.pm so I created that for
> backpatch.  With attached patches I am to make it pass in v12,v11,v10
> (with fix) and fail (without fix).  However, we will have to make some
> change for 9.6 because pg_basebackup doesn't support -Xnone on 9.6,
> maybe we can delete the content from pg_wal after the backup, if we
> think that approach looks fine then I will make the changes for 9.6 as
> well.
>
> Note: for param backport for v12 and v11 same patch getting applied
> but for v10 due to some conflict we need a separate patch (both
> attached).

I have fixed it for 9.6 as well by removing the wal from the xlog
directory. Attaching all the patches in single mail to avoid
confusion.

Note:
v7-0001 applies to master, v13,v12 (but for v12 before this we need to
apply backport)
v12-v8-0001-Backport is same as v11-v8-0001-Backport (duplicated for
version wise separation)
v11-v8-0002 is same as v10-v8-0002

Basically, for v12 and v11 same backport patch works and for V11 and
V10 same main patch works, still I duplicated them to avoid confusion.
-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 8b96729a51015e850672b2d685ec1b61490f967c Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Tue, 8 Jun 2021 14:31:22 -0400
Subject: [PATCH v8 1/2] Back-port a few PostgresNode.pm methods.

The 'lsn' and 'wait_for_catchup' methods only exist in v10 and
higher, but are needed in order to support a test planned test
case for a bug that exists all the way back to v9.6. To minimize
cross-branch differences in the test case, back-port these
methods.

Discussion: http://postgr.es/m/ca+tgmoag5dma_8xc1wvbvftpjtwx5uzkgehxe7mij+im9jy...@mail.gmail.com
---
 src/test/perl/PostgresNode.pm | 83 +++
 1 file changed, 83 insertions(+)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 5c59265..5f5a7d2 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1472,6 +1472,89 @@ sub issues_sql_like
 
 =pod
 
+=item $node->lsn(mode)
+
+Look up WAL locations on the server:
+
+ * insert location (master only, error on replica)
+ * write location (master only, error on replica)
+ * flush location (master only, error on replica)
+ * receive location (always undef on master)
+ * replay location (always undef on master)
+
+mode must be specified.
+
+=cut
+
+sub lsn
+{
+	my ($self, $mode) = @_;
+	my %modes = (
+		'insert'  => 'pg_current_xlog_insert_location()',
+		'flush'   => 'pg_current_xlog_flush_location()',
+		'write'   => 'pg_current_xlog_location()',
+		'receive' => 'pg_last_xlog_receive_location()',
+		'replay'  => 'pg_last_xlog_replay_location()');
+
+	$mode = '' if !defined($mode);
+	die "unknown mode for 'lsn': '$mode', valid modes are "
+	  . join(', ', keys %modes)
+	  if !defined($modes{$mode});
+
+	my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}");
+	chomp($result);
+	if ($result eq '')
+	{
+		return;
+	}
+	else
+	{
+		return $result;
+	}
+}
+
+=pod
+
+=item $node->wait_for_catchup(standby_name, mode, target_lsn)
+
+Wait for the node with application_name standby_name (usually from node->name)
+until its replication position in pg_stat_replication equals or passes the
+upstream's xlog insert point at the time this function is called. By default
+the replay_location is waited for, but 'mode' may be specified to wait for any
+of sent|write|flush|replay.
+
+If there is no active replication connection from this peer, waits until
+poll_query_until timeout.
+
+Requires that the 'postgres' db exists and is accessible.
+
+target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert').
+
+This is not a test. It die()s on failure.
+
+=cut
+
+sub wait_for_catchup
+{
+	my ($self, $standby_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'replay';
+	my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 );
+	die "unknown mode $mode for 'wait_for_catchup', valid modes are " 

Re: Logical replication keepalive flood

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 1:47 PM Kyotaro Horiguchi
 wrote:
>
> At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi 
>  wrote in
> > The issue - if actually it is - we send a keep-alive packet before a
> > quite short sleep.
> >
> > We really want to send it if the sleep gets long but we cannot predict
> > that before entering a sleep.
> >
> > Let me think a little more on this..
>
> After some investigation, I find out that the keepalives are sent
> almost always after XLogSendLogical requests for the *next* record.
>

Does these keepalive messages are sent at the same frequency even for
subscribers? Basically, I wanted to check if we have logical
replication set up between 2 nodes then do we send these keep-alive
messages flood? If not, then why is it different in the case of
pg_recvlogical? Is it possible that the write/flush location is not
updated at the pace at which we expect? Please see commit 41d5f8ad73
which seems to be talking about a similar problem.

-- 
With Regards,
Amit Kapila.




Re: Duplicate history file?

2021-06-09 Thread Kyotaro Horiguchi
At Wed, 09 Jun 2021 16:56:14 +0900, Tatsuro Yamada 
 wrote in 
> Hi,
> 
> On 2021/06/09 16:23, Fujii Masao wrote:
> > On 2021/06/09 15:58, Tatsuro Yamada wrote:
> >> This may not be important at this time since it is a
> >> PoC patch, but I would like to inform you that there
> >> was a line that contained multiple spaces instead of tabs.
> >>
> >> $ git diff --check
> >> src/backend/access/transam/xlogarchive.c:465: trailing whitespace.
> >> +
> > Even with the patch, if "test ! -f ..." is used in archive_command,
> > you may still *easily* get the trouble that WAL archiving keeps
> > failing?

I'm not sure, but in regard to the the cause that the patch treats, if
an already-archived file is recycled or deleted then the same file is
restored from archive, that could happen. But the WAL segment that
contains the latest checkpoint won't be deleted. The same can be said
on history files.

> Thanks for your comment.
> 
> Yes, it may solve the error when using the test command, but it is
> dangerous to continue using the cp command, which is listed as an
> example of an archive command.

"test" command?

At first I thought that the archive command needs to compare the whole
file content *always*, but that happens with the same frequency with
the patch runs a whole-file comparison.

> > Instead, we should consider and document "better" command for
> > archive_command, or implement something like pg_archivecopy command
> > into the core (as far as I remember, there was the discussion about
> > this feature before...)?
> 
> 
> I agree with that idea.
> Since archiving is important for all users, I think there should be
> either a better and safer command in the documentation, or an archive
> command (pg_archivecopy?) that we provide as a community, as you said.
> I am curious about the conclusions of past discussions. :)

How perfect the officially-provided script or command need to be?  The
reason that the script in the documentation is so simple is, I guess,
we don't/can't offer a steps sufficiently solid for all-directions.

Since we didn't noticed that the "test ! -f" harms so it has been
there but finally we need to remove it.  Instead, we need to write
doen the known significant requirements by words. I'm afraid that the
concrete script would be a bit complex for the documentation..

So what we can do that is:

 - Remove the "test ! -f" from the sample command (for *nixen).

 - Rewrite at least the following portion in the documentation. [1]

   > The archive command should generally be designed to refuse to
   > overwrite any pre-existing archive file. This is an important
   > safety feature to preserve the integrity of your archive in case
   > of administrator error (such as sending the output of two
   > different servers to the same archive directory).
   > 
   > It is advisable to test your proposed archive command to ensure
   > that it indeed does not overwrite an existing file, and that it
   > returns nonzero status in this case. The example command above
   > for Unix ensures this by including a separate test step. On some
   > Unix platforms, cp has switches such as -i that can be used to do
   > the same thing less verbosely, but you should not rely on these
   > without verifying that the right exit status is returned. (In
   > particular, GNU cp will return status zero when -i is used and
   > the target file already exists, which is not the desired
   > behavior.)

The replacement would be something like:

"There is a case where WAL file and timeline history files is archived
more than once.  The archive command should generally be designed to
refuse to replace any pre-existing archive file with a file with
different content but to return zero if the file to be archived is
identical with the preexisting file."

But I'm not sure how it looks like.. (even ignoring the broken
phrasing..)
 

1: https://www.postgresql.org/docs/11/continuous-archiving.html

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: logical replication of truncate command with trigger causes Assert

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 5:29 AM Tom Lane  wrote:
>
> Mark Dilger  writes:
> > On Jun 8, 2021, at 3:55 PM, Tom Lane  wrote:
> >> I suppose that either apply_dispatch or LogicalRepApplyLoop needs to
> >> grow some more snapshot management logic, but I've not looked at that
> >> code much, so I don't have an opinion on just where to add it.
>
> > I was looking at those for other reasons prior to hitting this bug.
>
> After looking at it a bit, I see a couple of options:
>
> 1. Just wrap the call of ExecuteTruncateGuts with
> PushActiveSnapshot(GetTransactionSnapshot()) and PopActiveSnapshot().
>
> 2. Decide that we ought to ensure that a snapshot exists throughout
> most of this code.  It's not entirely obvious to me that there is no
> code path reachable from, say, apply_handle_truncate's collection of
> relation OIDs that needs a snapshot.  If we went for that, I'd think
> the right solution is to do PushActiveSnapshot right after each
> ensure_transaction call, and then PopActiveSnapshot on the way out of
> the respective subroutine.  We could then drop the snapshot management
> calls that are currently associated with the executor state.
>

+1 for the second option as with that, apart from what you said it
will take off some load from future developers to decide which part of
changes should be after acquiring snapshot.

-- 
With Regards,
Amit Kapila.




Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-09 Thread Amit Kapila
On Wed, Jun 9, 2021 at 12:03 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, June 9, 2021 12:06 PM Amit Kapila  
> wrote:
> > On Tue, Jun 8, 2021 at 6:24 PM vignesh C  wrote:
>
> > > 3) Should [user] catalog tables be catalog tables or user catalog
> > > tables [user] catalog tables
> > >
> >
> > The third point is not clear. Can you please elaborate by quoting the exact
> > change from the patch?
> IIUC, he means to replace all descriptions "[user] catalog tables"
> with "catalog tables or user catalog tables" in the patch,
> because seemingly we don't use square brackets to describe optional clause in
> normal descriptions(like outside of synopsis and I don't find any example for 
> this).
> But, even if so, I would like to keep the current square brackets description,
> which makes sentence short and simple.
>

+1.

-- 
With Regards,
Amit Kapila.




Re: Logical replication keepalive flood

2021-06-09 Thread Kyotaro Horiguchi
At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi  
wrote in 
> The issue - if actually it is - we send a keep-alive packet before a
> quite short sleep.
> 
> We really want to send it if the sleep gets long but we cannot predict
> that before entering a sleep.
> 
> Let me think a little more on this..

After some investigation, I find out that the keepalives are sent
almost always after XLogSendLogical requests for the *next* record. In
most of the cases the record is not yet inserted at the request time
but insertd very soon (in 1-digit milliseconds). It doesn't seem to be
expected that that happens with such a high frequency when
XLogSendLogical is keeping up-to-date with the bleeding edge of WAL
records.

It is completely unpredictable when the next record comes, so we
cannot decide whether to send a keepalive or not at the current
timing.

Since we want to send a keepalive when we have nothing to send for a
while, it is a bit different to keep sending keepalives at some
intervals while the loop is busy.

As a possible solution, the attached patch splits the sleep into two
pieces. If the first sleep reaches the timeout then send a keepalive
then sleep for the remaining time. The first timeout is quite
arbitrary but keepalive of 4Hz at maximum doesn't look so bad to me.

Is it acceptable?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 109c723f4e..49b3c0d4e2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -105,6 +105,9 @@
  */
 #define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
 
+/* Minimum idle time for sending an idle-time keepalive in milliseconds */
+#define KEEPALIVE_TIMEOUT 250
+
 /* Array of WalSnds in shared memory */
 WalSndCtlData *WalSndCtl = NULL;
 
@@ -244,7 +247,7 @@ static void WalSndKeepalive(bool requestReply);
 static void WalSndKeepaliveIfNecessary(void);
 static void WalSndCheckTimeOut(void);
 static long WalSndComputeSleeptime(TimestampTz now);
-static void WalSndWait(uint32 socket_events, long timeout, uint32 wait_event);
+static int WalSndWait(uint32 socket_events, long timeout, uint32 wait_event);
 static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static void WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
@@ -1428,19 +1431,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 		if (got_STOPPING)
 			break;
 
-		/*
-		 * We only send regular messages to the client for full decoded
-		 * transactions, but a synchronous replication and walsender shutdown
-		 * possibly are waiting for a later location. So, before sleeping, we
-		 * send a ping containing the flush location. If the receiver is
-		 * otherwise idle, this keepalive will trigger a reply. Processing the
-		 * reply will update these MyWalSnd locations.
-		 */
-		if (MyWalSnd->flush < sentPtr &&
-			MyWalSnd->write < sentPtr &&
-			!waiting_for_ping_response)
-			WalSndKeepalive(false);
-
 		/* check whether we're done */
 		if (loc <= RecentFlushPtr)
 			break;
@@ -1483,6 +1473,39 @@ WalSndWaitForWal(XLogRecPtr loc)
 		if (pq_is_send_pending())
 			wakeEvents |= WL_SOCKET_WRITEABLE;
 
+		/*
+		 * We only send regular messages to the client for full decoded
+		 * transactions, but a synchronous replication and walsender shutdown
+		 * possibly are waiting for a later location. So, before sleeping, we
+		 * send a ping containing the flush location. If the receiver is
+		 * otherwise idle, this keepalive will trigger a reply. Processing the
+		 * reply will update these MyWalSnd locations. If the sleep is shorter
+		 * than KEEPALIVE_TIMEOUT milliseconds, we skip sending a keepalive to
+		 * prevent it from getting too-frequent.
+		 */
+		if (MyWalSnd->flush < sentPtr &&
+			MyWalSnd->write < sentPtr &&
+			!waiting_for_ping_response)
+		{
+			if (sleeptime > KEEPALIVE_TIMEOUT)
+			{
+int r;
+
+r = WalSndWait(wakeEvents, KEEPALIVE_TIMEOUT,
+			   WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+
+if (r != 0)
+	continue;
+
+sleeptime -= KEEPALIVE_TIMEOUT;
+			}
+
+			WalSndKeepalive(false);
+
+			if (pq_flush_if_writable() != 0)
+WalSndShutdown();
+		}
+		
 		WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL);
 	}
 
@@ -3136,15 +3159,18 @@ WalSndWakeup(void)
  * composed of optional WL_SOCKET_WRITEABLE and WL_SOCKET_READABLE flags.  Exit
  * on postmaster death.
  */
-static void
+static int
 WalSndWait(uint32 socket_events, long timeout, uint32 wait_event)
 {
 	WaitEvent	event;
+	int			ret;
 
 	ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, socket_events, NULL);
-	if (WaitEventSetWait(FeBeWaitSet, timeout, , 1, wait_event) == 1 &&
-		(event.events & WL_POSTMASTER_DEATH))
+	ret = WaitEventSetWait(FeBeWaitSet, timeout, , 1, wait_event);
+	if (ret == 

RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread tsunakawa.ta...@fujitsu.com
From: Masahiko Sawada 
> Maybe it's better to start a new thread to discuss this topic. If your
> idea is good, we can lower all error that happened after writing the
> commit record to warning, reducing the cases where the client gets
> confusion by receiving an error after the commit.

No.  It's an important part because it determines the 2PC behavior and 
performance.  This discussion had started from the concern about performance 
before Ikeda-san reported pathological results.  Don't rush forward, hoping 
someone will commit the current patch.  I'm afraid you just don't want to 
change your design and code.  Let's face the real issue.

As I said before, and as Ikeda-san's performance benchmark results show, I have 
to say the design isn't done sufficiently.  I talked with Fujii-san the other 
day about this patch.  The patch is already huge and it's difficult to decode 
how the patch works, e.g., what kind of new WALs it emits, how many disk writes 
it adds, how the error is handled, whether/how it's different from the textbook 
or other existing designs, etc.  What happend to my request to add such design 
description to the following page, so that reviewers can consider the design 
before spending much time on looking at the code?  What's the situation of the 
new FDW API that should naturally accommodate other FDW implementations?

Atomic Commit of Distributed Transactions
https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions

Design should come first.  I don't think it's a sincere attitude to require 
reviewers to spend long time to read the design from huge code.


Regards
Takayuki Tsunakawa


Re: Race condition in recovery?

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 12:14 PM Dilip Kumar  wrote:
>
> On Wed, Jun 9, 2021 at 2:07 AM Robert Haas  wrote:
> 2021-06-09 12:11:08.618 IST [122456] LOG:  entering standby mode
> 2021-06-09 12:11:08.622 IST [122456] LOG:  restored log file 
> "0002.history" from archive
> cp: cannot stat 
> ‘/home/dilipkumar/work/PG/postgresql/src/test/recovery/tmp_check/t_025_stuck_on_old_timeline_primary_data/archives/00010002’:
>  No such file or directory
> 2021-06-09 12:11:08.627 IST [122456] LOG:  redo starts at 0/228
> 2021-06-09 12:11:08.627 IST [122456] LOG:  consistent recovery state reached 
> at 0/300
>
> Next, I will investigate, without a fix on v11 (maybe v12, v10..) why it is 
> not hitting the defect location at all.  And after that, I will check the 
> status on other older versions.

Reason for the problem was that the "-Xnone" parameter was not
accepted by "sub backup" in PostgresNode.pm so I created that for
backpatch.  With attached patches I am to make it pass in v12,v11,v10
(with fix) and fail (without fix).  However, we will have to make some
change for 9.6 because pg_basebackup doesn't support -Xnone on 9.6,
maybe we can delete the content from pg_wal after the backup, if we
think that approach looks fine then I will make the changes for 9.6 as
well.

Note: for param backport for v12 and v11 same patch getting applied
but for v10 due to some conflict we need a separate patch (both
attached).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 3782d36bd8821b1e7785fbd247aafda6a6cf8975 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 9 Jun 2021 13:15:22 +0530
Subject: [PATCH] Back-port backup param in PostgresNode.pm

---
 src/test/perl/PostgresNode.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index fdcc159..9d895c1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -512,13 +512,13 @@ target server since it isn't done by default.
 
 sub backup
 {
-	my ($self, $backup_name) = @_;
+	my ($self, $backup_name, %params) = @_;
 	my $backup_path = $self->backup_dir . '/' . $backup_name;
 	my $name= $self->name;
 
 	print "# Taking pg_basebackup $backup_name from node \"$name\"\n";
 	TestLib::system_or_bail('pg_basebackup', '-D', $backup_path, '-h',
-		$self->host, '-p', $self->port, '--no-sync');
+		$self->host, '-p', $self->port, '--no-sync', @{ $params{backup_options} });
 	print "# Backup finished\n";
 }
 
-- 
1.8.3.1

From a52e20bd0bde14d5e194e3d853b9f6ea72019ad5 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 9 Jun 2021 12:52:42 +0530
Subject: [PATCH v8] Back port backup param in PostgresNode.pm

---
 src/test/perl/PostgresNode.pm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 61aa048..beb7bc1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -548,7 +548,7 @@ target server since it isn't done by default.
 
 sub backup
 {
-	my ($self, $backup_name) = @_;
+	my ($self, $backup_name, %params) = @_;
 	my $backup_path = $self->backup_dir . '/' . $backup_name;
 	my $name= $self->name;
 
@@ -556,7 +556,8 @@ sub backup
 	TestLib::system_or_bail(
 		'pg_basebackup', '-D', $backup_path, '-h',
 		$self->host, '-p', $self->port,  '--checkpoint',
-		'fast',  '--no-sync');
+		'fast',  '--no-sync',
+		@{ $params{backup_options} });
 	print "# Backup finished\n";
 	return;
 }
-- 
1.8.3.1

From 0c8e0ebeb480b434b58e29c46594f873db3f7087 Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Tue, 8 Jun 2021 12:52:55 -0400
Subject: [PATCH v8 2/2] Fix corner case failure of new standby to follow new
 primary.

This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.

Commit ee994272ca50f70b53074f0febaec97e28f83c4e introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.

Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.

Discussion: 

Re: Duplicate history file?

2021-06-09 Thread Tatsuro Yamada

Hi,

On 2021/06/09 16:23, Fujii Masao wrote:

On 2021/06/09 15:58, Tatsuro Yamada wrote:

This may not be important at this time since it is a
PoC patch, but I would like to inform you that there
was a line that contained multiple spaces instead of tabs.

$ git diff --check
src/backend/access/transam/xlogarchive.c:465: trailing whitespace.
+


Even with the patch, if "test ! -f ..." is used in archive_command,
you may still *easily* get the trouble that WAL archiving keeps failing?


Thanks for your comment.

Yes, it may solve the error when using the test command, but it is
dangerous to continue using the cp command, which is listed as an
example of an archive command.

 

Instead, we should consider and document "better" command for
archive_command, or implement something like pg_archivecopy command
into the core (as far as I remember, there was the discussion about
this feature before...)?



I agree with that idea.
Since archiving is important for all users, I think there should be
either a better and safer command in the documentation, or an archive
command (pg_archivecopy?) that we provide as a community, as you said.
I am curious about the conclusions of past discussions. :)


Regards,
Tatsuro Yamada






Re: Error on pgbench logs

2021-06-09 Thread Fabien COELHO


Hello Michael,


The cause is that the time unit is changed to usec but the patch
forgot to convert agg_interval into the same unit in doLog. I tempted
to change it into pg_time_usec_t but it seems that it is better that
the unit is same with other similar variables like duration.


As the option remains in seconds, I think that it is simpler to keep
it as an int, and do the conversion where need be.  It would be good
to document that agg_interval is in seconds where the variable is
defined.

-   while (agg->start_time + agg_interval <= now)
+   while (agg->start_time + agg_interval * 100 <= now)

In need of a cast with (int64), no?


Yes, it would be better. In practice I would not expect the interval to be 
large enough to trigger an overflow (maxint µs is about 36 minutes).



The other things are "progress" and "duration".  These look correctly
handled to me.


Hmmm… What about tests?

I'm pretty sure that I wrote a test about time sensitive features with a 2 
seconds run (-T, -P, maybe these aggregates as well), but the test needed 
to be quite loose so as to pass on slow/heavy loaded hosts, and was 
removed at some point on the ground that it was somehow imprecise.

I'm not sure whether it is worth to try again.

--
Fabien.

Re: RFC: Logging plan of the running query

2021-06-09 Thread torikoshia

On 2021-05-28 15:51, torikoshia wrote:

On 2021-05-13 21:57, Dilip Kumar wrote:
On Thu, May 13, 2021 at 5:18 PM Dilip Kumar  
wrote:


On Thu, May 13, 2021 at 5:15 PM Bharath Rupireddy
 wrote:
>
> On Thu, May 13, 2021 at 5:14 PM Dilip Kumar  wrote:
> >
> > On Thu, May 13, 2021 at 4:16 PM Bharath Rupireddy
> >  wrote:
> > >
> > > I'm saying that -  currently, queries are logged with LOG level when
> > > the log_statement GUC is set. The queries might be sent to the
> > > non-superuser clients. So, your point of "sending the plan to those
> > > clients is not a good idea from a security perspective" gets violated
> > > right? Should the log level be changed(in the below code) from "LOG"
> > > to "LOG_SERVER_ONLY"? I think we can discuss this separately so as not
> > > to sidetrack the main feature.
> > >
> > > /* Log immediately if dictated by log_statement */
> > > if (check_log_statement(parsetree_list))
> > > {
> > > ereport(LOG,
> > > (errmsg("statement: %s", query_string),
> > >  errhidestmt(true),
> > >  errdetail_execute(parsetree_list)));
> > >
> >
> > Yes, that was my exact point, that in this particular code log with
> > LOG_SERVER_ONLY.
> >
> > Like this.
> >  /* Log immediately if dictated by log_statement */
> >  if (check_log_statement(parsetree_list))
> >  {
> >  ereport(LOG_SERVER_ONLY,
>
> Agree, but let's discuss that in a separate thread.

Did not understand why separate thread? this is part of this thread
no? but anyways now everyone agreed that we will log with
LOG_SERVER_ONLY.


Modified elevel from LOG to LOG_SERVER_ONLY.

I also modified the patch to log JIT Summary and GUC settings 
information.

If there is other useful information to log, I would appreciate it if
you could point it out.


Updated the patch.

- reordered superuser check which was pointed out in another thread[1]
- added a regression test

[1] https://postgr.es/m/ylxw1uvgiap5u...@paquier.xyz

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 7ad9a280b6f74e4718293863716046c02b0a3835 Mon Sep 17 00:00:00 2001
From: atorik 
Date: Wed, 9 Jun 2021 15:03:39 +0900
Subject: [PATCH v3] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
before we check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_current_plan() function that requests to log the plan
of the specified backend process.

Only superusers are allowed to request to log the plans because
allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial
of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.
---
 doc/src/sgml/func.sgml   |  21 
 src/backend/commands/explain.c   | 116 +++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   2 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/procsignal.h |   1 +
 src/test/regress/expected/misc_functions.out |  10 +-
 src/test/regress/sql/misc_functions.sql  |   6 +-
 11 files changed, 167 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 08b07f561e..399fa12aa2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24940,6 +24940,27 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_current_plan
+
+pg_log_current_plan ( pid integer )
+boolean
+   
+   
+Requests to log the untruncated query string and its plan for
+the query currently running on the backend with the specified
+process ID.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+Only superusers can request to log plan of the running query.
+   
+  
+
   

 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9a60865d19..1cea557764 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -21,6 +21,7 @@
 #include "executor/nodeHash.h"
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
+#include "miscadmin.h"
 #include 

Re: [PATCH] In psql \?, add [+] annotation where appropriate

2021-06-09 Thread Michael Paquier
On Tue, May 25, 2021 at 06:10:15AM +, Neil Chen wrote:
> Hi, thank you for your work. I think this is a meaningful patch that
> should be merged.

Merged, then.  I have scanned the rest of the area and did not notice
any other inconsistencies.
--
Michael


signature.asc
Description: PGP signature


Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-09 Thread Masahiko Sawada
On Wed, Jun 9, 2021 at 4:10 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Masahiko Sawada 
> > On Tue, Jun 8, 2021 at 5:28 PM tsunakawa.ta...@fujitsu.com
> >  wrote:
> > > Then, in what kind of scenario are we talking about the difficulty, and 
> > > how is
> > it difficult to handle, when we adopt either the method 1 or 2?  (I'd just 
> > like to
> > have the same clear picture.)
> >
> > IMO, even though FDW's commit/rollback transaction code could be
> > simple in some cases, I think we need to think that any kind of errors
> > (or even FATAL or PANIC) could be thrown from the FDW code. It could
> > be an error due to a temporary network problem, remote server down,
> > driver’s unexpected error, or out of memory etc. Errors that happened
> > after the local transaction commit doesn't affect the global
> > transaction decision, as you mentioned. But the proccess or system
> > could be in a bad state. Also, users might expect the process to exit
> > on error by setting  exit_on_error = on. Your idea sounds like that we
> > have to ignore any errors happening after the local commit if they
> > don’t affect the transaction outcome. It’s too scary to me and I think
> > that it's a bad idea to blindly ignore all possible errors under such
> > conditions. That could make the thing worse and will likely be
> > foot-gun. It would be good if we can prove that it’s safe to ignore
> > those errors but not sure how we can at least for me.
> >
> > This situation is true even today; an error could happen after
> > committing the transaction. But I personally don’t want to add the
> > code that increases the likelihood.
>
> I'm not talking about the code simplicity here (actually, I haven't reviewed 
> the code around prepare and commit in the patch yet...)  Also, I don't 
> understand well what you're trying to insist and what realistic situations 
> you have in mind by citing exit_on_error, FATAL, PANIC and so on.  I just 
> asked (in a different part) why the client has to know the error.
>
> Just to be clear, I'm not saying that we should hide the error completely 
> behind the scenes.  For example, you can allow the FDW to emit a WARNING if 
> the DBMS-specific client driver returns an error when committing.  Further, 
> if you want to allow the FDW to throw an ERROR when committing, the 
> transaction manager in core can catch it by PG_TRY(), so that it can report 
> back successfull commit of the global transaction to the client while it 
> leaves the handling of failed commit of the FDW to the resolver.  (I don't 
> think we like to use PG_TRY() during transaction commit for performance 
> reasons, though.)
>
> Let's give it a hundred steps and let's say we want to report the error of 
> the committing FDW to the client.  If that's the case, we can use SQLSTATE 
> 02xxx (Warning) and attach the error message.
>

Maybe it's better to start a new thread to discuss this topic. If your
idea is good, we can lower all error that happened after writing the
commit record to warning, reducing the cases where the client gets
confusion by receiving an error after the commit.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Duplicate history file?

2021-06-09 Thread Fujii Masao




On 2021/06/09 15:58, Tatsuro Yamada wrote:

Hi


Thank you for fixing the patch.
The new patch works well in my environment. :-D


This may not be important at this time since it is a
PoC patch, but I would like to inform you that there
was a line that contained multiple spaces instead of tabs.

$ git diff --check
src/backend/access/transam/xlogarchive.c:465: trailing whitespace.
+


Even with the patch, if "test ! -f ..." is used in archive_command,
you may still *easily* get the trouble that WAL archiving keeps failing?

Instead, we should consider and document "better" command for
archive_command, or implement something like pg_archivecopy command
into the core (as far as I remember, there was the discussion about
this feature before...)?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




  1   2   >