date:20230116

On Tue, Jan 17, 2023 at 08:43:44AM +0100, Peter Eisentraut wrote:
> Ok, I understand now, and I agree with this approach over the opposite. I
> was confused because the snippet you showed above used "jumble_ignore", but
> your patch is correct as it uses "jumble_location".

Okay.  I'll refresh the patch set so as we have only "jumble_ignore",
then, like v1, with preparatory patches for what you mentioned and
anything that comes into mind.

> That said, the term "jumble" is really weird, because in the sense that we
> are using it here it means, approximately, "to mix together", "to unify".
> So what we are doing with the Const nodes is really to *not* jumble the
> location, but for all other node types we are jumbling the location.  At
> least that is my understanding.

I am quite familiar with this term, FWIW.  That's what we've inherited
from the days where this has been introduced in pg_stat_statements.
--
Michael


signature.asc
Description: PGP signature

RE: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-16 Thread shiy.f...@fujitsu.com

On Wed, Jan 11, 2023 4:31 PM Melih Mutlu  wrote:
> 
> Hi hackers,
> 
> Rebased the patch to resolve conflicts.
> 

Thanks for your patch. Here are some comments.

0001 patch

1. walsender.c
+   /* Create a tuple to send consisten WAL location */

"consisten" should be "consistent" I think.

2. logical.c
+   if (need_full_snapshot)
+   {
+   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+
+   SpinLockAcquire(&slot->mutex);
+   slot->effective_catalog_xmin = xmin_horizon;
+   slot->data.catalog_xmin = xmin_horizon;
+   slot->effective_xmin = xmin_horizon;
+   SpinLockRelease(&slot->mutex);
+
+   xmin_horizon = 
GetOldestSafeDecodingTransactionId(!need_full_snapshot);
+   ReplicationSlotsComputeRequiredXmin(true);
+
+   LWLockRelease(ProcArrayLock);
+   }

It seems that we should first get the safe decoding xid, then inform the slot
machinery about the new limit, right? Otherwise the limit will be
InvalidTransactionId and that seems inconsistent with the comment.

3. doc/src/sgml/protocol.sgml
+   is used in the currenct transaction. This command is currently only 
supported
+   for logical replication.
+   slots.

We don't need to put "slots" in a new line.

0002 patch

1.
In pg_subscription_rel.h, I think the type of "srrelslotname" can be changed to
NameData, see "subslotname" in pg_subscription.h.

2.
+* Find the logical replication sync worker if 
exists store
+* the slot number for dropping associated 
replication slots
+* later.

Should we add comma after "if exists"?

3.
+   PG_FINALLY();
+   {
+   pfree(cmd.data);
+   }
+   PG_END_TRY();
+   \
+   return tablelist;
+}

Do we need the backslash?

4.
+   /*
+* Advance to the LSN got from walrcv_create_slot. This is WAL
+* logged for the purpose of recovery. Locks are to prevent the
+* replication origin from vanishing while advancing.

"walrcv_create_slot" should be changed to
"walrcv_create_slot/walrcv_slot_snapshot" I think.

5.
+   /* Replication drop might still exist. Try to drop */
+   replorigin_drop_by_name(originname, true, false);

Should "Replication drop" be "Replication origin"?

6.
I saw an assertion failure in the following case, could you please look into it?
The backtrace is attached.

-- pub
CREATE TABLE tbl1 (a int, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create publication pub for table tbl1, tbl2;
insert into tbl1 values (1, 'a');
insert into tbl1 values (1, 'a');

-- sub
CREATE TABLE tbl1 (a int primary key, b text);
CREATE TABLE tbl2 (a int primary key, b text);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

Subscriber log:
2023-01-17 14:47:10.054 CST [1980841] LOG:  logical replication apply worker 
for subscription "sub" has started
2023-01-17 14:47:10.060 CST [1980843] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl1" has started
2023-01-17 14:47:10.070 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl2" has started
2023-01-17 14:47:10.073 CST [1980843] ERROR:  duplicate key value violates 
unique constraint "tbl1_pkey"
2023-01-17 14:47:10.073 CST [1980843] DETAIL:  Key (a)=(1) already exists.
2023-01-17 14:47:10.073 CST [1980843] CONTEXT:  COPY tbl1, line 2
2023-01-17 14:47:10.074 CST [1980821] LOG:  background worker "logical 
replication worker" (PID 1980843) exited with exit code 1
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub", table "tbl2" has finished
2023-01-17 14:47:10.083 CST [1980845] LOG:  logical replication table 
synchronization worker for subscription "sub" has moved to sync table "tbl1".
TRAP: failed Assert("node != InvalidRepOriginId"), File: "origin.c", Line: 892, 
PID: 1980845

Regards,
Shi yu
#0  0x7f38bb78baff in raise () from /lib64/libc.so.6
#1  0x7f38bb75eea5 in abort () from /lib64/libc.so.6
#2  0x00b44501 in ExceptionalCondition (conditionName=0xd0deac "node != 
InvalidRepOriginId", fileName=0xd0da4d "origin.c", lineNumber=892) at 
assert.c:66
#3  0x008f93f5 in replorigin_advance (node=0, remote_commit=22144256, 
local_commit=0, go_backward=true, wal_log=true) at origin.c:892
#4  0x0090cf2f in LogicalRepSyncTableStart 
(origin_startpos=0x7ffcbe54e808) at tablesync.c:1641
#5  0x00913ee4 in start_table_sync (origin_startpos=0x7ffcbe54e808, 
myslotname=0x7ffcbe54e790) at worker.c:4419
#6  0x009140be in run_tablesync_worker (options=0x7ffcbe54e7c0, 
slotname=0x0, originname=0x7ffcbe54e810 "pg_16398_3", originname_size=64, 
origin_startpos=0x7ffcbe54e808)
at worker.c:4497
#7  0x00

Re: Generating code for query jumbling through gen_node_support.pl


On 17.01.23 04:48, Michael Paquier wrote:

Anyway, when it comes to the location, another thing that can be
considered here would be to require a node-level flag for the nodes on
which we want to track the location.  This overlaps a bit with the
fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes
most of the code changes like this one as at the end we only care
about the location for Const nodes:
-   int location;   /* token location, or -1 if unknown */
+   int location pg_node_attr(jumble_ignore);   /* token location, or -1
+* if unknown */

I have taken this approach in v2 of the patch, shaving ~100 lines of
more code as there is no need to mark all these location fields with
"jumble_ignore" anymore, except for Const, of course.


I don't understand why you chose that when we already have an established
way.  This would just make the jumble annotations inconsistent with the
other annotations.


Because we don't want to track the location of all the nodes!  If we
do that, pg_stat_statements would begin to parameterize a lot more
things, from column aliases to full contents of IN or WITH clauses.
The root point is that we only want to track the jumble location for
Const nodes now.  In order to do that, there are two approaches:
- Mark all the locations with jumble_ignore: more invasive, but
it requires only one per-field attribute with "jumble_ignore".  This
is what v1 did.


Ok, I understand now, and I agree with this approach over the opposite. 
I was confused because the snippet you showed above used 
"jumble_ignore", but your patch is correct as it uses "jumble_location".


That said, the term "jumble" is really weird, because in the sense that 
we are using it here it means, approximately, "to mix together", "to 
unify".  So what we are doing with the Const nodes is really to *not* 
jumble the location, but for all other node types we are jumbling the 
location.  At least that is my understanding.

Re: Small miscellaneus fixes (Part II)

2023-01-16 Thread John Naylor

On Mon, Jan 16, 2023 at 1:28 PM John Naylor 
wrote:
>
>
> I wrote:
> > ...but arguably the earlier check is close enough that it's silly to
assert in the "else" branch, and I'd be okay with just nuking those lines.
Another thing that caught my attention is the assumption that unsetting a
bit is so expensive that we have to first check if it's set, so we may as
well remove "IS_BRACKET(Np->Num)" as well.
>
> The attached is what I mean. I'll commit this this week unless there are
objections.

I've pushed this and the cosmetic fix in pg_dump. Those were the only
things I saw that had some interest, so I closed the CF entry.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Removing redundant grouping columns

2023-01-16 Thread Richard Guo

On Sun, Jan 15, 2023 at 5:23 AM Tom Lane  wrote:

> vignesh C  writes:
> > The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
> Yeah, sideswiped by 3c6fc5820 apparently.  No substantive change needed.

I looked through these two patches and they look good to me.

BTW, another run of rebase is needed, due to da5800d5fa.

Thanks
Richard

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

Richard Guo  writes:
> BTW, I wonder if we should have checked CoercionForm before
> fix_upper_expr_mutator steps into CoerceViaIO->arg to adjust the expr
> there.

I will just quote what it says in primnodes.h:

 * NB: equal() ignores CoercionForm fields, therefore this *must* not carry
 * any semantically significant information.

If you think the planner should act differently for different values of
CoercionForm, you are mistaken.  Maybe this is evidence of some
previous bit of brain-fade, but if so we need to fix that.

regards, tom lane

Re: Make use of assign_checkpoint_completion_target() to calculate CheckPointSegments correctly

On Mon, Dec 26, 2022 at 06:12:34PM +0530, Bharath Rupireddy wrote:
> It looks like assign_checkpoint_completion_target() is defined [1],
> but never used, because of which CheckPointSegments may miss to
> account for changed checkpoint_completion_target. I'm attaching a tiny
> patch to fix this.
> 
> Thoughts?

Oops.  It looks like you are right here.  This would impact the
calculation of CheckPointSegments on reload when
checkpoint_completion_target is updated.  This is wrong since we've
switched to max_wal_size as of 88e9823, so this had better be
backpatched all the way down.

Thoughts?
--
Michael

signature.asc
Description: PGP signature

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Masahiko Sawada

On Tue, Jan 17, 2023 at 12:37 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, January 17, 2023 11:32 AM Peter Smith  
> wrote:
> >
> > On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, January 17, 2023 5:43 AM Peter Smith
> >  wrote:
> > > >
> > > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila
> > > > 
> > > > wrote:
> > > > >
> > > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith
> > > > > 
> > > > wrote:
> > > > > >
> > > > > > 2.
> > > > > >
> > > > > >  /*
> > > > > > + * Return the pid of the leader apply worker if the given pid
> > > > > > +is the pid of a
> > > > > > + * parallel apply worker, otherwise return InvalidPid.
> > > > > > + */
> > > > > > +pid_t
> > > > > > +GetLeaderApplyWorkerPid(pid_t pid) {  int leader_pid =
> > > > > > +InvalidPid;  int i;
> > > > > > +
> > > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > > > > +
> > > > > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > > > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > > > > +
> > > > > > + if (isParallelApplyWorker(w) && w->proc && pid ==
> > > > > > + w->proc->pid) { leader_pid = w->leader_pid; break; } }
> > > > > > +
> > > > > > + LWLockRelease(LogicalRepWorkerLock);
> > > > > > +
> > > > > > + return leader_pid;
> > > > > > +}
> > > > > >
> > > > > > 2a.
> > > > > > IIUC the IsParallelApplyWorker macro does nothing except check
> > > > > > that the leader_pid is not InvalidPid anyway, so AFAIK this
> > > > > > algorithm does not benefit from using this macro because we will
> > > > > > want to return InvalidPid anyway if the given pid matches.
> > > > > >
> > > > > > So the inner condition can just say:
> > > > > >
> > > > > > if (w->proc && w->proc->pid == pid) { leader_pid =
> > > > > > w->leader_pid; break; }
> > > > > >
> > > > >
> > > > > Yeah, this should also work but I feel the current one is explicit
> > > > > and more clear.
> > > >
> > > > OK.
> > > >
> > > > But, I have one last comment about this function -- I saw there are
> > > > already other functions that iterate max_logical_replication_workers
> > > > like this looking for things:
> > > > - logicalrep_worker_find
> > > > - logicalrep_workers_find
> > > > - logicalrep_worker_launch
> > > > - logicalrep_sync_worker_count
> > > >
> > > > So I felt this new function (currently called
> > > > GetLeaderApplyWorkerPid) ought to be named similarly to those ones.
> > > > e.g. call it something like "logicalrep_worker_find_pa_leader_pid".
> > > >
> > >
> > > I am not sure we can use the name, because currently all the API name
> > > in launcher that used by other module(not related to subscription) are
> > > like AxxBxx style(see the functions in logicallauncher.h).
> > > logicalrep_worker_xxx style functions are currently only declared in
> > > worker_internal.h.
> > >
> >
> > OK. I didn't know there was another header convention that you were 
> > following.
> > In that case, it is fine to leave the name as-is.
>
> Thanks for confirming!
>
> Attach the new version 0001 patch which addressed all other comments.
>

Thank you for updating the patch. Here is one comment:

@@ -426,14 +427,24 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)

/*
 * Show the leader only for active
parallel workers.  This
-* leaves the field as NULL for the
leader of a parallel
-* group.
+* leaves the field as NULL for the
leader of a parallel group
+* or the leader of parallel apply workers.
 */
if (leader && leader->pid !=
beentry->st_procpid)
{
values[28] = Int32GetDatum(leader->pid);
nulls[28] = false;
}
+   else
+   {
+   int
leader_pid = GetLeaderApplyWorkerPid(beentry->st_procpid);
+
+   if (leader_pid != InvalidPid)
+   {
+   values[28] =
Int32GetDatum(leader_pid);
+   nulls[28] = false;
+   }
+   }
}

I'm slightly concerned that there could be overhead of executing
GetLeaderApplyWorkerPid () for every backend process except for
parallel query workers. The number of such backends could be large and
GetLeaderApplyWorkerPid() acquires the lwlock. For example, does it
make sense to check (st_backendType == B_BG_WORKER) before calling
GetLeaderApplyWorkerPid()? Or it might not be a problem since it's
LogicalRepWorkerLock which is not likely to be contended.

Regards,

-- 
Masahiko S

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2023-01-16 Thread Richard Guo

On Tue, Jan 17, 2023 at 11:39 AM David Rowley  wrote:

> On Tue, 17 Jan 2023 at 13:16, Dean Rasheed 
> wrote:
> > I took a look at this, and I agree that the best solution is probably
> > to have make_pathkeys_for_groupagg() ignore Aggrefs that contain
> > volatile functions.
>
> Thanks for giving that some additional thought.  I've just pushed a
> fix which adjusts things that way.

This makes a lot of sense.  I agree that we shouldn't do pre-sorting for
volatile sort expressions, especially when there are multiple aggregates
with the same volatile sort expression.

Not related to this specific issue, but I find sorting by volatile
expression is confusing in different scenarios.  Consider the two
queries given by David

Query 1:
select string_agg(random()::text, ',' order by random()) from
generate_series(1,3);

Query 2:
select random()::text from generate_series(1,3) order by random();

Considering the targetlist as Aggref->args or Query->targetList, in both
queries we would add an additional TargetEntry (as resjunk column) for
the ORDER BY item 'random()', because it's not present in the existing
targetlist.  Note that the existing TargetEntry for 'random()::text' is
a CoerceViaIO expression which is an explicit cast, so we cannot strip
it and match it to the ORDER BY item.  Thus we would have two random()
FuncExprs in the final targetlist, for both queries.

In query 1 we call random() twice for each tuple, one for the original
TargetEntry 'random()::text', and one for the TargetEntry of the ORDER
BY item 'random()', and do the sorting according to the second call
results.  Thus we would notice the final output is unsorted because it's
from the first random() call.

However, in query 2 we have the ORDER BY item 'random()' in the
scan/join node's targetlist.  And then for the two random() FuncExprs in
the final targetlist, set_plan_references would adjust both of them to
refer to the outputs of the scan/join node.  Thus random() is actually
called only once for each tuple and we would find the final output is
sorted.

It seems we fail to keep consistent about the behavior of sorting by
volatile expression in the two scenarios.

BTW, I wonder if we should have checked CoercionForm before
fix_upper_expr_mutator steps into CoerceViaIO->arg to adjust the expr
there.  It seems parser checks it and only strips implicit coercions
when matching TargetEntry expr to ORDER BY item.

Thanks
Richard

Re: constify arguments of copy_file() and copydir()

On Mon, Jan 16, 2023 at 10:53:40AM +0900, Michael Paquier wrote:
> +1.

While I don't forget about this thread..  Any objections if I were to
apply that?
--
Michael


signature.asc
Description: PGP signature

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread houzj.f...@fujitsu.com

On Tuesday, January 17, 2023 12:55 PM Amit Kapila  
wrote:
> 
> On Tue, Jan 17, 2023 at 8:59 AM Amit Kapila  wrote:
> >
> > On Tue, Jan 17, 2023 at 8:35 AM Masahiko Sawada
>  wrote:
> > >
> > > On Mon, Jan 16, 2023 at 3:19 PM Amit Kapila 
> wrote:
> > > >
> > > > Okay, I have added the comments in get_transaction_apply_action()
> > > > and updated the comments to refer to the enum TransApplyAction
> > > > where all the actions are explained.
> > >
> > > Thank you for the patch.
> > >
> > > @@ -1710,6 +1712,7 @@ apply_handle_stream_stop(StringInfo s)
> > > }
> > >
> > > in_streamed_transaction = false;
> > > +   stream_xid = InvalidTransactionId;
> > >
> > > We reset stream_xid also in stream_close_file() but probably it's no
> > > longer necessary?
> > >
> >
> > I think so.
> >
> > > How about adding an assertion in apply_handle_stream_start() to make
> > > sure the stream_xid is invalid?
> > >
> >
> > I think it would be better to add such an assert in
> > apply_handle_begin/apply_handle_begin_prepare because there won't be a
> > problem if we start_stream message even when stream_xid is valid.
> > However, maybe it is better to add in all three functions
> >
> (apply_handle_begin/apply_handle_begin_prepare/apply_handle_stream_star
> t).
> > What do you think?
> >
> > > ---
> > > It's not related to this issue but I realized that if the action
> > > returned by get_transaction_apply_action() is not handled in the
> > > switch statement, we do only Assert(false). Is it better to raise an
> > > error like "unexpected apply action %d" just in case in order to
> > > detect failure cases also in the production environment?
> > >
> >
> > Yeah, that may be better. Shall we do that as part of this patch only
> > or as a separate patch?
> >
> 
> Please find attached the updated patches to address the above comments. I
> think we can combine and commit them as one patch as both are related.

Thanks for fixing these.
I have confirmed that all regression tests passed after applying the patches.
And the patches look good to me.

Best regards,
Hou zj

Re: doc: add missing "id" attributes to extension packaging page

2023-01-16 Thread Brar Piening


On 17.01.2023 at 02:05, Karl O. Pinc wrote:

Or maybe the right way is to set a mode at the very top,
the first apply-templates call, and not mess with the
built-in templates at all.  (You'd write your own
"postgres-mode" templates the same way, to "wrap"
and call the default templates.)

Think of the mode as an implicit argument that's preserved and
passed down through each template invocation without having to
be explicitly specified by the calling code.


I think the document you're missing is [1].

There are multiple ways to customize DocBook XSL output and it sounds
like you want me to write a customization layer which I didn't do
because there is precedent that the typical "way to do it" (TM) in the
PostgreSQL project is [2].

Regards,

Brar

[1] http://www.sagehill.net/docbookxsl/CustomizingPart.html
[2] http://www.sagehill.net/docbookxsl/ReplaceTemplate.html

Re: recovery modules

On Mon, Jan 16, 2023 at 02:40:40PM -0800, Nathan Bossart wrote:
> On Mon, Jan 16, 2023 at 04:36:01PM +0900, Michael Paquier wrote:
> > Once this issue was fixed, nothing else stood out, so applied this
> > part.
> 
> Thanks!  I've attached a rebased version of the rest of the patch set.

When it comes to 0002, the only difference between the three code
paths of shell_recovery_end(), shell_archive_cleanup() and
shell_restore() is the presence of BuildRestoreCommand().  However
this is now just a thin wrapper of replace_percent_placeholders() that
does just one extra make_native_path() for the xlogpath.

Could it be cleaner in the long term to remove entirely
BuildRestoreCommand() and move the conversion of the xlogpath with
make_native_path() one level higher in the stack?
--
Michael

signature.asc
Description: PGP signature

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

On Mon, Jan 16, 2023 at 11:53:57AM +0100, Jelte Fennema wrote:
>> Perhaps it would be simpler to use copy_auth_token() in this code path
>> and always free the resulting token?
> 
> I initially tried that when working on the patch, but copy_auth_token
> (surprisingly) doesn't copy the regex field into the new AuthToken.
> So we'd have to regenerate it conditionally. Making the copy
> conditional seemed just as simple code-wise, with the added
> bonus that it's not doing a useless copy.

Okay, I can live with that.

>> In the code path where system-user is a regexp, could it be better
>> to skip the replacement of \1 in the new AuthToken if pg-user is
>> itself a regexp?  The compiled regexp would be the same, but it could
>> be considered as a bit confusing, as it can be thought that the
>> compiled regexp of pg-user happened after the replacement?
> 
> I updated 0004 to prioritize membership checks and regexes over
> substitution of \1. I also added tests for this. Prioritizing "all" over
> substitution of \1 is not necessary, since by definition "all" does
> not include \1.

Thanks, 0003 is OK, so applied now.

0004 looks fine as well, be it for the tests (I am hesitating to tweak
things a bit here actually for the role names), the code or the docs,
still I am planning a second lookup.
--
Michael


signature.asc
Description: PGP signature

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Masahiko Sawada

On Tue, Jan 17, 2023 at 1:55 PM Amit Kapila  wrote:
>
> On Tue, Jan 17, 2023 at 8:59 AM Amit Kapila  wrote:
> >
> > On Tue, Jan 17, 2023 at 8:35 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Jan 16, 2023 at 3:19 PM Amit Kapila  
> > > wrote:
> > > >
> > > > Okay, I have added the comments in get_transaction_apply_action() and
> > > > updated the comments to refer to the enum TransApplyAction where all
> > > > the actions are explained.
> > >
> > > Thank you for the patch.
> > >
> > > @@ -1710,6 +1712,7 @@ apply_handle_stream_stop(StringInfo s)
> > > }
> > >
> > > in_streamed_transaction = false;
> > > +   stream_xid = InvalidTransactionId;
> > >
> > > We reset stream_xid also in stream_close_file() but probably it's no
> > > longer necessary?
> > >
> >
> > I think so.
> >
> > > How about adding an assertion in apply_handle_stream_start() to make
> > > sure the stream_xid is invalid?
> > >
> >
> > I think it would be better to add such an assert in
> > apply_handle_begin/apply_handle_begin_prepare because there won't be a
> > problem if we start_stream message even when stream_xid is valid.
> > However, maybe it is better to add in all three functions
> > (apply_handle_begin/apply_handle_begin_prepare/apply_handle_stream_start).
> > What do you think?
> >
> > > ---
> > > It's not related to this issue but I realized that if the action
> > > returned by get_transaction_apply_action() is not handled in the
> > > switch statement, we do only Assert(false). Is it better to raise an
> > > error like "unexpected apply action %d" just in case in order to
> > > detect failure cases also in the production environment?
> > >
> >
> > Yeah, that may be better. Shall we do that as part of this patch only
> > or as a separate patch?
> >
>
> Please find attached the updated patches to address the above
> comments. I think we can combine and commit them as one patch as both
> are related.

Thank you for the patches! Looks good to me. And +1 to merge them.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

RE: Ability to reference other extensions by schema in extension scripts

2023-01-16 Thread Regina Obe

> 
> On Thu, Dec 15, 2022 at 08:04:22AM -0500, Regina Obe wrote:
> > > On Tue, Nov 22, 2022 at 11:24:19PM -0500, Regina Obe wrote:
> > >
> > > > If an extension is required by another extension and that required
> > > > extension schema is referenced in the extension scripts using the
> > > > @extschema:extensionname@ syntax, then ideally we should prevent
> > > > the required extension from being relocatable.  This would prevent
> > > > a user from accidentally moving the required extension, thus
> > > > breaking the dependent extensions.
> > > >
> > > > I didn't add that feature cause I wasn't sure if it was
> > > > overstepping the bounds of what should be done, or if we leave it
> > > > up to the user to just know better.
> > >
> > > An alternative would be to forbid using @extschema:extensionname@ to
> > > reference relocatable extensions. DBA can toggle relocatability of
> > > an extension to allow it to be referenced.
> >
> > That would be hard to do in a DbaaS setup and not many users know they
> > can fiddle with extension control files.
> > Plus those would get overwritten with upgrades.
> 
> Wouldn't this also be the case if you override relocatability ?
> Case:
> 
>   - Install fuzzystrmatch, marked as relocatable
>   - Install ext2 depending on the former, which is them marked
> non-relocatable
>   - Upgrade database -> fuzzystrmatch becomes relocatable again
>   - Change fuzzystrmatch schema BREAKING ext2
> 

Somewhat. It would be an issue if someone does

ALTER EXTENSION fuzzystrmatch UPDATE;

And 

ALTER EXTENSION fuzzystrmatch SET SCHEMA a_different_schema;

Otherwise the relocatability of an already installed extension wouldn't
change even during upgrade. I haven't checked pg_upgrade, but I suspect it
wouldn't change there either.

It's my understanding that once an extension is installed, it's relocatable
status is recorded in the pg_extension table.  So it doesn't matter at that
point what the control file says. However if someone does update the
extension, then yes it would look at the control file and make it updatable
again.

I just tested this fiddling with postgis extension and moving it and then
upgrading.

UPDATE pg_extension SET extrelocatable = true where extname = 'postgis';
ALTER EXTENSION postgis SET schema postgis;

ALTER EXTENSION postgis UPDATE;
e.g. if the above is already at latest version, get notice
NOTICE:  version "3.3.2" of extension "postgis" is already installed
(and the extension is still relocatable)

-- if the extension can be upgraded
ALTER EXTENSION postgis UPDATE;

-- no longer relocatable (because postgis control file has relocatable =
false)

But honestly I don't expect this to be a huge issue, more of just an extra
safety block.
Not a bullet-proof safety block though.

> Allowing to relocate a dependency of other extensions using the
> @extschema@ syntax is very dangerous.
> 
> I've seen that PostgreSQL itself doesn't even bother to replace
@extschema@
> IF the extension using it doesn't mark itself as non-relocatable. For
consistency
> this patch should basically refuse to expand @extschema:fuzzystrmatch@ if
> "fuzzystrmatch" extension is relocatable.
> 
> Changing the current behaviour of PostgreSQL could be proposed but I don't
> think it's to be done in this thread ?
> 
> So my suggestion is to start consistent (do not expand if referenced
extension
> is relocatable).
> 
> 
> --strk;

I don't agree. That would make this patch of not much use.
For example lets revisit my postgis_tiger_geocoder which is a good bulk of
the reason why I want this.

I use indexes that use postgis_tiger_geocoder functions that call
fuzzystrmatch which causes pg_restore to break on reload and other issues,
because I'm not explicitly referencing the function schema.  With your
proposal now I got to demand the postgresql project to make fuzzystrmatch
not relocatable so I can use this feature.  It is so rare for people to go
around moving the locations of their extensions once set, that I honestly
don't think 
the ALTER EXTENSION .. UPDATE  hole is a huge deal.

I'd be more annoyed having to beg an extension provider to mark their
extension as not relocatable so that I could explicitly reference the
location of their extensions.

And even then - think about it.  I ask extension provider to make their
extension schema relocatable.  They do, but some people are using a version
before they marked it as schema relocatable.  So now if I change my code,
users can't install my extension, cause they are using a version before it
was schema relocatable and I'm using the new syntax.

What would be more bullet-proof is having an extra column in pg_extension or
adding an extra array element to pg_extension.extcondition[] that allows you
to say "Hey, don't allow this to be relocatable cause other extensions
depend on it that have explicitly referenced the schema."

Thanks,
Regina

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Amit Kapila

On Tue, Jan 17, 2023 at 8:59 AM Amit Kapila  wrote:
>
> On Tue, Jan 17, 2023 at 8:35 AM Masahiko Sawada  wrote:
> >
> > On Mon, Jan 16, 2023 at 3:19 PM Amit Kapila  wrote:
> > >
> > > Okay, I have added the comments in get_transaction_apply_action() and
> > > updated the comments to refer to the enum TransApplyAction where all
> > > the actions are explained.
> >
> > Thank you for the patch.
> >
> > @@ -1710,6 +1712,7 @@ apply_handle_stream_stop(StringInfo s)
> > }
> >
> > in_streamed_transaction = false;
> > +   stream_xid = InvalidTransactionId;
> >
> > We reset stream_xid also in stream_close_file() but probably it's no
> > longer necessary?
> >
>
> I think so.
>
> > How about adding an assertion in apply_handle_stream_start() to make
> > sure the stream_xid is invalid?
> >
>
> I think it would be better to add such an assert in
> apply_handle_begin/apply_handle_begin_prepare because there won't be a
> problem if we start_stream message even when stream_xid is valid.
> However, maybe it is better to add in all three functions
> (apply_handle_begin/apply_handle_begin_prepare/apply_handle_stream_start).
> What do you think?
>
> > ---
> > It's not related to this issue but I realized that if the action
> > returned by get_transaction_apply_action() is not handled in the
> > switch statement, we do only Assert(false). Is it better to raise an
> > error like "unexpected apply action %d" just in case in order to
> > detect failure cases also in the production environment?
> >
>
> Yeah, that may be better. Shall we do that as part of this patch only
> or as a separate patch?
>

Please find attached the updated patches to address the above
comments. I think we can combine and commit them as one patch as both
are related.

-- 
With Regards,
Amit Kapila.


v2-0001-Improve-the-code-to-decide-the-apply-action.patch
Description: Binary data


v2-0002-Change-assert-to-elog-for-unexpected-apply-action.patch
Description: Binary data

Re: New strategies for freezing, advancing relfrozenxid early

On Mon, Jan 16, 2023 at 8:13 PM Dilip Kumar  wrote:
> I think that it makes sense to keep 'vacuum_freeze_strategy_threshold'
> strictly for freezing.  But the point is that the eager scanning
> strategy is driven by table freezing needs of the table (tableagefrac)
> that make sense, but if we have selected the eager freezing based on
> the table age and its freezing need then why don't we force the eager
> freezing as well if we have selected eager scanning, after all the
> eager scanning is selected for satisfying the freezing need.

Don't think of eager scanning as the new name for aggressive mode --
it's a fairly different concept, because we care about costs now.
Eager scanning can be chosen just because it's very cheap relative to
the alternative of lazy scanning, even when relfrozenxid is still very
recent. (This kind of behavior isn't really new [1], but the exact
implementation from the patch is new.)

Tables such as pgbench_branches and pgbench_tellers will reliably use
eager scanning strategy, no matter how any GUC has been set -- just
because the added cost is always zero (relative to lazy scanning). It
really doesn't matter how far along tableagefrac here, ever. These
same tables will never use eager freezing strategy, unless the
vacuum_freeze_strategy_threshold GUC is misconfigured. (This is
another example of how scanning strategy and freezing strategy may
differ for the same table.)

You do have a good point, though. I think that I know what you mean.
Note that antiwraparound autovacuums (or VACUUMs of tables very near
to that point) *will* always use both the eager freezing strategy and
the eager scanning strategy -- which is probably close to what you
meant.

The important point is that there can be more than one reason to
prefer one strategy to another -- and the reasons can be rather
different. Occasionally it'll be a combination of two factors together
that push things in favor of one strategy over the other -- even
though either factor on its own would not have resulted in the same
choice.

[1] 
https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Constantly_updated_tables_.28usually_smaller_tables.29
-- 
Peter Geoghegan

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread shveta malik

On Tue, Jan 17, 2023 at 9:07 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, January 17, 2023 11:32 AM Peter Smith  
> wrote:
> >
> > On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, January 17, 2023 5:43 AM Peter Smith
> >  wrote:
> > > >
> > > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila
> > > > 
> > > > wrote:
> > > > >
> > > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith
> > > > > 
> > > > wrote:
> > > > > >
> > > > > > 2.
> > > > > >
> > > > > >  /*
> > > > > > + * Return the pid of the leader apply worker if the given pid
> > > > > > +is the pid of a
> > > > > > + * parallel apply worker, otherwise return InvalidPid.
> > > > > > + */
> > > > > > +pid_t
> > > > > > +GetLeaderApplyWorkerPid(pid_t pid) {  int leader_pid =
> > > > > > +InvalidPid;  int i;
> > > > > > +
> > > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > > > > +
> > > > > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > > > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > > > > +
> > > > > > + if (isParallelApplyWorker(w) && w->proc && pid ==
> > > > > > + w->proc->pid) { leader_pid = w->leader_pid; break; } }
> > > > > > +
> > > > > > + LWLockRelease(LogicalRepWorkerLock);
> > > > > > +
> > > > > > + return leader_pid;
> > > > > > +}
> > > > > >
> > > > > > 2a.
> > > > > > IIUC the IsParallelApplyWorker macro does nothing except check
> > > > > > that the leader_pid is not InvalidPid anyway, so AFAIK this
> > > > > > algorithm does not benefit from using this macro because we will
> > > > > > want to return InvalidPid anyway if the given pid matches.
> > > > > >
> > > > > > So the inner condition can just say:
> > > > > >
> > > > > > if (w->proc && w->proc->pid == pid) { leader_pid =
> > > > > > w->leader_pid; break; }
> > > > > >
> > > > >
> > > > > Yeah, this should also work but I feel the current one is explicit
> > > > > and more clear.
> > > >
> > > > OK.
> > > >
> > > > But, I have one last comment about this function -- I saw there are
> > > > already other functions that iterate max_logical_replication_workers
> > > > like this looking for things:
> > > > - logicalrep_worker_find
> > > > - logicalrep_workers_find
> > > > - logicalrep_worker_launch
> > > > - logicalrep_sync_worker_count
> > > >
> > > > So I felt this new function (currently called
> > > > GetLeaderApplyWorkerPid) ought to be named similarly to those ones.
> > > > e.g. call it something like "logicalrep_worker_find_pa_leader_pid".
> > > >
> > >
> > > I am not sure we can use the name, because currently all the API name
> > > in launcher that used by other module(not related to subscription) are
> > > like AxxBxx style(see the functions in logicallauncher.h).
> > > logicalrep_worker_xxx style functions are currently only declared in
> > > worker_internal.h.
> > >
> >
> > OK. I didn't know there was another header convention that you were 
> > following.
> > In that case, it is fine to leave the name as-is.
>
> Thanks for confirming!
>
> Attach the new version 0001 patch which addressed all other comments.
>
> Best regards,
> Hou zj

Hello Hou-san,

1. Do we need to extend test-cases to review the leader_pid column in
pg_stats tables?
2. Do we need to follow the naming convention for
'GetLeaderApplyWorkerPid' like other functions in the same file which
starts with 'logicalrep_'

thanks
Shveta

Re: New strategies for freezing, advancing relfrozenxid early

2023-01-16 Thread Dilip Kumar

On Mon, Jan 16, 2023 at 11:31 PM Peter Geoghegan  wrote:
>
> > I think '(nextXID - cutoffs->relfrozenxid) / freeze_table_age' should
> > be the actual fraction right?  What is the point of adding 0.5 to the
> > divisor?  If there is a logical reason, maybe we can explain in the
> > comments.
>
> It's just a way of avoiding division by zero.

oh, correct :)

> > While looking into the logic of 'lazy_scan_strategy', I think the idea
> > looks very good but the only thing is that
> > we have kept eager freeze and eager scan completely independent.
> > Don't you think that if a table is chosen for an eager scan
> > then we should force the eager freezing as well?
>
> Earlier versions of the patch kind of worked that way.
> lazy_scan_strategy would actually use twice the GUC setting to
> determine scanning strategy. That approach could make our "transition
> from lazy to eager strategies" involve an excessive amount of
> "catch-up freezing" in the VACUUM operation that advanced relfrozenxid
> for the first time, which you see an example of here:
>
> https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch
>
> Now we treat the scanning and freezing strategies as two independent
> choices. Of course they're not independent in any practical sense, but
> I think it's slightly simpler and more elegant that way -- it makes
> the GUC vacuum_freeze_strategy_threshold strictly about freezing
> strategy, while still leading to VACUUM advancing relfrozenxid in a
> way that makes sense. It just happens as a second order effect. Why
> add a special case?

I think that it makes sense to keep 'vacuum_freeze_strategy_threshold'
strictly for freezing.  But the point is that the eager scanning
strategy is driven by table freezing needs of the table (tableagefrac)
that make sense, but if we have selected the eager freezing based on
the table age and its freezing need then why don't we force the eager
freezing as well if we have selected eager scanning, after all the
eager scanning is selected for satisfying the freezing need.  But
OTOH, the eager scanning might get selected if it appears that we
might not have to scan too many extra pages compared to lazy scan so
in those cases forcing the eager freezing might not be wise.  So maybe
it is a good idea to keep them the way you have in your patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

On Mon, Jan 16, 2023 at 8:25 AM Robert Haas  wrote:
> I really dislike formulas like Min(freeze_max_age * 2, 1 billion).
> That looks completely magical from a user perspective. Some users
> aren't going to understand autovacuum behavior at all. Some will, and
> will be able to compare age(relfrozenxid) against
> autovacuum_freeze_max_age. Very few people are going to think to
> compare age(relfrozenxid) against some formula based on
> autovacuum_freeze_max_age. I guess if we document it, maybe they will.

What do you think of Andres' autovacuum_no_auto_cancel_age proposal?

As I've said several times already, I am by no means attached to the
current formula.

> I do like the idea of driving the auto-cancel behavior off of the
> results of previous attempts to vacuum the table. That could be done
> independently of the XID age of the table.

Even when the XID age of the table has already significantly surpassed
autovacuum_freeze_max_age, say due to autovacuum worker starvation?

> If we've failed to vacuum
> the table, say, 10 times, because we kept auto-cancelling, it's
> probably appropriate to force the issue.

I suggested 1000 times upthread. 10 times seems very low, at least if
"number of times cancelled" is the sole criterion, without any
attention paid to relfrozenxid age or some other tiebreaker.

> It doesn't really matter
> whether the autovacuum triggered because of bloat or because of XID
> age. Letting either of those things get out of control is bad.

While inventing a new no-auto-cancel behavior that prevents bloat from
getting completely out of control may well have merit, I don't see why
it needs to be attached to this other effort.

I think that the vast majority of individual tables have autovacuums
cancelled approximately never, and so my immediate concern is
ameliorating cases where not being able to auto-cancel once in a blue
moon causes an outage. Sure, the opposite problem also exists, and I
think that it would be really bad if it was made significantly worse
as an unintended consequence of a patch that addressed just the first
problem. But that doesn't mean we have to solve both problems together
at the same time.

> But at that point a lot of harm has already
> been done. In a frequently updated table, waiting 300 million XIDs to
> stop cancelling the vacuum is basically condemning the user to have to
> run VACUUM FULL. The table can easily be ten or a hundred times bigger
> than it should be by that point.

The rate at which relfrozenxid ages is just about useless as a proxy
for how much wall clock time has passed with a given workload --
workloads are usually very bursty. It's much worse still as a proxy
for what has changed in the table; completely static tables have their
relfrozenxid age at exactly the same rate as the most frequently
updated table in the same database (the table that "consumes the most
XIDs"). So while antiwraparound autovacuum no-auto-cancel behavior may
indeed save the user from problems with serious bloat, it will happen
pretty much by mistake. Not that it doesn't happen all the same -- of
course it does.

That factor (the mistake factor) doesn't mean I take the point any
less seriously. What I don't take seriously is the idea that the
precise XID age was ever crucially important.

More generally, I just don't accept that this leaves with no room for
something along the lines of my proposed, such as Andres'
autovacuum_freeze_max_age concept. As I've said already, there will
usually be a very asymmetric quality to the problem in cases like the
Joyent outage. Even a modest amount of additional XID-space-headroom
will very likely be all that will be needed at the critical juncture.
It may not be perfect, but it still has every potential to make things
safer for some users, without making things any less safe for other
users.

--
Peter Geoghegan

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Peter Smith

On Tue, Jan 17, 2023 at 2:37 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, January 17, 2023 11:32 AM Peter Smith  
> wrote:
> >
> > On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, January 17, 2023 5:43 AM Peter Smith
> >  wrote:
> > > >
> > > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila
> > > > 
> > > > wrote:
> > > > >
> > > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith
> > > > > 
> > > > wrote:
> > > > > >
> > > > > > 2.
> > > > > >
> > > > > >  /*
> > > > > > + * Return the pid of the leader apply worker if the given pid
> > > > > > +is the pid of a
> > > > > > + * parallel apply worker, otherwise return InvalidPid.
> > > > > > + */
> > > > > > +pid_t
> > > > > > +GetLeaderApplyWorkerPid(pid_t pid) {  int leader_pid =
> > > > > > +InvalidPid;  int i;
> > > > > > +
> > > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > > > > +
> > > > > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > > > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > > > > +
> > > > > > + if (isParallelApplyWorker(w) && w->proc && pid ==
> > > > > > + w->proc->pid) { leader_pid = w->leader_pid; break; } }
> > > > > > +
> > > > > > + LWLockRelease(LogicalRepWorkerLock);
> > > > > > +
> > > > > > + return leader_pid;
> > > > > > +}
> > > > > >
> > > > > > 2a.
> > > > > > IIUC the IsParallelApplyWorker macro does nothing except check
> > > > > > that the leader_pid is not InvalidPid anyway, so AFAIK this
> > > > > > algorithm does not benefit from using this macro because we will
> > > > > > want to return InvalidPid anyway if the given pid matches.
> > > > > >
> > > > > > So the inner condition can just say:
> > > > > >
> > > > > > if (w->proc && w->proc->pid == pid) { leader_pid =
> > > > > > w->leader_pid; break; }
> > > > > >
> > > > >
> > > > > Yeah, this should also work but I feel the current one is explicit
> > > > > and more clear.
> > > >
> > > > OK.
> > > >
> > > > But, I have one last comment about this function -- I saw there are
> > > > already other functions that iterate max_logical_replication_workers
> > > > like this looking for things:
> > > > - logicalrep_worker_find
> > > > - logicalrep_workers_find
> > > > - logicalrep_worker_launch
> > > > - logicalrep_sync_worker_count
> > > >
> > > > So I felt this new function (currently called
> > > > GetLeaderApplyWorkerPid) ought to be named similarly to those ones.
> > > > e.g. call it something like "logicalrep_worker_find_pa_leader_pid".
> > > >
> > >
> > > I am not sure we can use the name, because currently all the API name
> > > in launcher that used by other module(not related to subscription) are
> > > like AxxBxx style(see the functions in logicallauncher.h).
> > > logicalrep_worker_xxx style functions are currently only declared in
> > > worker_internal.h.
> > >
> >
> > OK. I didn't know there was another header convention that you were 
> > following.
> > In that case, it is fine to leave the name as-is.
>
> Thanks for confirming!
>
> Attach the new version 0001 patch which addressed all other comments.
>

OK. I checked the differences between patches v81-0001/v82-0001 and
found everything I was expecting to see.

I have no more review comments for v82-0001.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Generating code for query jumbling through gen_node_support.pl

On Mon, Jan 16, 2023 at 03:13:35PM +0100, Peter Eisentraut wrote:
> On 13.01.23 08:54, Michael Paquier wrote:
>> Using a "jumble_ignore" flag is equally invasive to using an
>> "jumble_include" flag for each field, I am afraid, as the number of
>> fields in the nodes included in jumbles is pretty equivalent to the
>> number of fields ignored.  I tend to prefer the approach of ignoring
>> things though, which is more consistent with the practive for node
>> read, write and copy.
> 
> I concur that jumble_ignore is better.  I suppose you placed the
> jumble_ignore markers to maintain parity with the existing code, but I think
> that some the markers are actually wrong and are just errors of omission in
> the existing code (such as Query.override, to pick a random example).  The
> way you have structured this would allow us to find and analyze such errors
> better.

Thanks.  Yes, I have made an effort of going down to maintain an exact
compatibility with the existing code for now.  My take is that
removing or adding more things into the jumble deserves its own
discussion.  I think that's a bit better once this code is automated
with the nodes, now it would not be difficult either to adjust HEAD
depending on that.  Only the analysis effort is different.

>> Anyway, when it comes to the location, another thing that can be
>> considered here would be to require a node-level flag for the nodes on
>> which we want to track the location.  This overlaps a bit with the
>> fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes
>> most of the code changes like this one as at the end we only care
>> about the location for Const nodes:
>> -   int location;   /* token location, or -1 if unknown */
>> +   int location pg_node_attr(jumble_ignore);   /* token location, 
>> or -1
>> +* if unknown */
>> 
>> I have taken this approach in v2 of the patch, shaving ~100 lines of
>> more code as there is no need to mark all these location fields with
>> "jumble_ignore" anymore, except for Const, of course.
> 
> I don't understand why you chose that when we already have an established
> way.  This would just make the jumble annotations inconsistent with the
> other annotations.

Because we don't want to track the location of all the nodes!  If we
do that, pg_stat_statements would begin to parameterize a lot more
things, from column aliases to full contents of IN or WITH clauses.
The root point is that we only want to track the jumble location for
Const nodes now.  In order to do that, there are two approaches:
- Mark all the locations with jumble_ignore: more invasive, but
it requires only one per-field attribute with "jumble_ignore".  This
is what v1 did.
- Mark only the fields where we want to track the location with a
second node type, like "jumble_location".  We could restrict that
depending on the field type or its name on top of checking the field
attribute, whatever.  This is what v2 did.

Which approach do you prefer?  Marking all the locations we don't want
with jumble_ignore, or introduce a second attribute with
jumble_location.  I'd tend to prefer the approach of minimizing the
number of node and field attributes, FWIW.  Now you have worked on
this area previously, so your view may be more adapted than my
thinking process

The long-term perspective is that I'd like to extend the tracking of
the locations to more DDL nodes, like parameters of SET statements or
parts of CALL statements.  Not to mention that this makes the work of
forks easier.  This is a separate discussion.

> I also have two suggestions to make this patch more palatable:
> 
> 1. Make a separate patch to reformat long comments, like
> 835d476fd21bcfb60b055941dee8c3d9559af14c.
> 
> 2. Make a separate patch to move the jumble support to
> src/backend/nodes/jumblefuncs.c.  (It was probably a mistake that it wasn't
> there to begin with, and some of the errors of omission alluded to above are
> probably caused by it having been hidden away in the wrong place.)

Both suggestions make sense.  I'll shape the next series of the patch
to do something among these lines.

My question about the location tracking greatly influences the first
patch (comment reformating) and third patch (automated generation of
the code) of the series, so do you have a preference about that? 
--
Michael


signature.asc
Description: PGP signature

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2023-01-16 Thread David Rowley

On Tue, 17 Jan 2023 at 13:16, Dean Rasheed  wrote:
>
> On Wed, 11 Jan 2023 at 05:24, David Rowley  wrote:
> >
> > I'm wondering if 1349d279 should have just never opted to presort
> > Aggrefs which have volatile functions so that the existing behaviour
> > of unordered output is given always and nobody is fooled into thinking
> > this works correctly only to be disappointed later when they add some
> > other aggregate to their query or if we should fix both.  Certainly,
> > it seems much easier to do the former.
> >
>
> I took a look at this, and I agree that the best solution is probably
> to have make_pathkeys_for_groupagg() ignore Aggrefs that contain
> volatile functions.

Thanks for giving that some additional thought.  I've just pushed a
fix which adjusts things that way.

David

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread houzj.f...@fujitsu.com

On Tuesday, January 17, 2023 11:32 AM Peter Smith  wrote:
> 
> On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, January 17, 2023 5:43 AM Peter Smith
>  wrote:
> > >
> > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila
> > > 
> > > wrote:
> > > >
> > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith
> > > > 
> > > wrote:
> > > > >
> > > > > 2.
> > > > >
> > > > >  /*
> > > > > + * Return the pid of the leader apply worker if the given pid
> > > > > +is the pid of a
> > > > > + * parallel apply worker, otherwise return InvalidPid.
> > > > > + */
> > > > > +pid_t
> > > > > +GetLeaderApplyWorkerPid(pid_t pid) {  int leader_pid =
> > > > > +InvalidPid;  int i;
> > > > > +
> > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > > > +
> > > > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > > > +
> > > > > + if (isParallelApplyWorker(w) && w->proc && pid ==
> > > > > + w->proc->pid) { leader_pid = w->leader_pid; break; } }
> > > > > +
> > > > > + LWLockRelease(LogicalRepWorkerLock);
> > > > > +
> > > > > + return leader_pid;
> > > > > +}
> > > > >
> > > > > 2a.
> > > > > IIUC the IsParallelApplyWorker macro does nothing except check
> > > > > that the leader_pid is not InvalidPid anyway, so AFAIK this
> > > > > algorithm does not benefit from using this macro because we will
> > > > > want to return InvalidPid anyway if the given pid matches.
> > > > >
> > > > > So the inner condition can just say:
> > > > >
> > > > > if (w->proc && w->proc->pid == pid) { leader_pid =
> > > > > w->leader_pid; break; }
> > > > >
> > > >
> > > > Yeah, this should also work but I feel the current one is explicit
> > > > and more clear.
> > >
> > > OK.
> > >
> > > But, I have one last comment about this function -- I saw there are
> > > already other functions that iterate max_logical_replication_workers
> > > like this looking for things:
> > > - logicalrep_worker_find
> > > - logicalrep_workers_find
> > > - logicalrep_worker_launch
> > > - logicalrep_sync_worker_count
> > >
> > > So I felt this new function (currently called
> > > GetLeaderApplyWorkerPid) ought to be named similarly to those ones.
> > > e.g. call it something like "logicalrep_worker_find_pa_leader_pid".
> > >
> >
> > I am not sure we can use the name, because currently all the API name
> > in launcher that used by other module(not related to subscription) are
> > like AxxBxx style(see the functions in logicallauncher.h).
> > logicalrep_worker_xxx style functions are currently only declared in
> > worker_internal.h.
> >
> 
> OK. I didn't know there was another header convention that you were following.
> In that case, it is fine to leave the name as-is.

Thanks for confirming!

Attach the new version 0001 patch which addressed all other comments.

Best regards,
Hou zj


v82-0001-Display-the-leader-apply-worker-s-PID-for-parall.patch
Description:  v82-0001-Display-the-leader-apply-worker-s-PID-for-parall.patch

Re: Extracting cross-version-upgrade knowledge from buildfarm client

On Mon, Jan 16, 2023 at 04:48:28PM -0500, Tom Lane wrote:
> I'm slightly tempted to back-patch 002_pg_upgrade.pl so that there
> is an in-tree way to verify back-branch AdjustUpgrade.pm files.
> On the other hand, it's hard to believe that testing that in
> HEAD won't be sufficient; I doubt the back-branch copies will
> need to change much.

Backpatching 002_pg_upgrade.pl requires a bit more than the test:
there is one compatibility gotcha as of dc57366.  I did not backpatch
it because nobody has complained about it until I found out about it,
but the test would require it.

By the way, thanks for your work on this stuff :)
--
Michael

signature.asc
Description: PGP signature

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Peter Smith

On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, January 17, 2023 5:43 AM Peter Smith  
> wrote:
> >
> > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila 
> > wrote:
> > >
> > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith 
> > wrote:
> > > >
> > > > 2.
> > > >
> > > >  /*
> > > > + * Return the pid of the leader apply worker if the given pid is
> > > > +the pid of a
> > > > + * parallel apply worker, otherwise return InvalidPid.
> > > > + */
> > > > +pid_t
> > > > +GetLeaderApplyWorkerPid(pid_t pid)
> > > > +{
> > > > + int leader_pid = InvalidPid;
> > > > + int i;
> > > > +
> > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > > +
> > > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > > +
> > > > + if (isParallelApplyWorker(w) && w->proc && pid == w->proc->pid) {
> > > > + leader_pid = w->leader_pid; break; } }
> > > > +
> > > > + LWLockRelease(LogicalRepWorkerLock);
> > > > +
> > > > + return leader_pid;
> > > > +}
> > > >
> > > > 2a.
> > > > IIUC the IsParallelApplyWorker macro does nothing except check that
> > > > the leader_pid is not InvalidPid anyway, so AFAIK this algorithm
> > > > does not benefit from using this macro because we will want to
> > > > return InvalidPid anyway if the given pid matches.
> > > >
> > > > So the inner condition can just say:
> > > >
> > > > if (w->proc && w->proc->pid == pid)
> > > > {
> > > > leader_pid = w->leader_pid;
> > > > break;
> > > > }
> > > >
> > >
> > > Yeah, this should also work but I feel the current one is explicit and
> > > more clear.
> >
> > OK.
> >
> > But, I have one last comment about this function -- I saw there are already
> > other functions that iterate max_logical_replication_workers like this 
> > looking
> > for things:
> > - logicalrep_worker_find
> > - logicalrep_workers_find
> > - logicalrep_worker_launch
> > - logicalrep_sync_worker_count
> >
> > So I felt this new function (currently called GetLeaderApplyWorkerPid) ought
> > to be named similarly to those ones. e.g. call it something like
> > "logicalrep_worker_find_pa_leader_pid".
> >
>
> I am not sure we can use the name, because currently all the API name in 
> launcher that
> used by other module(not related to subscription) are like
> AxxBxx style(see the functions in logicallauncher.h).
> logicalrep_worker_xxx style functions are currently only declared in
> worker_internal.h.
>

OK. I didn't know there was another header convention that you were
following.  In that case, it is fine to leave the name as-is.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: typo in the subscription command tests

2023-01-16 Thread Amit Kapila

On Tue, Jan 17, 2023 at 8:30 AM Takamichi Osumi (Fujitsu)
 wrote:
>
> While working on some logical replication patch,
> I've find a typo on HEAD.
> Attached the modification patch for this.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Amit Kapila

On Tue, Jan 17, 2023 at 8:35 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 16, 2023 at 3:19 PM Amit Kapila  wrote:
> >
> > Okay, I have added the comments in get_transaction_apply_action() and
> > updated the comments to refer to the enum TransApplyAction where all
> > the actions are explained.
>
> Thank you for the patch.
>
> @@ -1710,6 +1712,7 @@ apply_handle_stream_stop(StringInfo s)
> }
>
> in_streamed_transaction = false;
> +   stream_xid = InvalidTransactionId;
>
> We reset stream_xid also in stream_close_file() but probably it's no
> longer necessary?
>

I think so.

> How about adding an assertion in apply_handle_stream_start() to make
> sure the stream_xid is invalid?
>

I think it would be better to add such an assert in
apply_handle_begin/apply_handle_begin_prepare because there won't be a
problem if we start_stream message even when stream_xid is valid.
However, maybe it is better to add in all three functions
(apply_handle_begin/apply_handle_begin_prepare/apply_handle_stream_start).
What do you think?

> ---
> It's not related to this issue but I realized that if the action
> returned by get_transaction_apply_action() is not handled in the
> switch statement, we do only Assert(false). Is it better to raise an
> error like "unexpected apply action %d" just in case in order to
> detect failure cases also in the production environment?
>

Yeah, that may be better. Shall we do that as part of this patch only
or as a separate patch?

-- 
With Regards,
Amit Kapila.

Re: Show various offset arrays for heap WAL records

On Wed, Jan 11, 2023 at 3:11 PM Peter Geoghegan  wrote:
> On Wed, Jan 11, 2023 at 3:00 PM Andres Freund  wrote:
> > What are your thoughts about the place for the helper functions? You're ok
> > with rmgrdesc_utils.[ch]?
>
> Yeah, that seems okay.

BTW, while playing around with this patch today, I noticed that it
won't display the number of elements in each offset array directly.
Perhaps it's worth including that, too?

-- 
Peter Geoghegan

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Masahiko Sawada

On Mon, Jan 16, 2023 at 3:19 PM Amit Kapila  wrote:
>
> On Sun, Jan 15, 2023 at 10:39 PM Tomas Vondra
>  wrote:
> >
> > I think there's a bug in how get_transaction_apply_action() interacts
> > with handle_streamed_transaction() to decide whether the transaction is
> > streamed or not. Originally, the code was simply:
> >
> > /* not in streaming mode */
> > if (!in_streamed_transaction)
> > return false;
> >
> > But now this decision was moved to get_transaction_apply_action(), which
> > does this:
> >
> > if (am_parallel_apply_worker())
> > {
> > return TRANS_PARALLEL_APPLY;
> > }
> > else if (in_remote_transaction)
> > {
> > return TRANS_LEADER_APPLY;
> > }
> >
> > and handle_streamed_transaction() then uses the result like this:
> >
> > /* not in streaming mode */
> > if (apply_action == TRANS_LEADER_APPLY)
> > return false;
> >
> > Notice this is not equal to the original behavior, because the two flags
> > (in_remote_transaction and in_streamed_transaction) are not inverse.
> > That is,
> >
> >in_remote_transaction=false
> >
> > does not imply we're processing streamed transaction. It's allowed both
> > flags are false, i.e. a change may be "non-transactional" and not
> > streamed, though the only example of such thing in the protocol are
> > logical messages. Which are however ignored in the apply worker, so I'm
> > not surprised no existing test failed on this.
> >
>
> Right, this is the reason we didn't catch it in our testing.
>
> > So I think get_transaction_apply_action() should do this:
> >
> > if (am_parallel_apply_worker())
> > {
> > return TRANS_PARALLEL_APPLY;
> > }
> > else if (!in_streamed_transaction)
> > {
> > return TRANS_LEADER_APPLY;
> > }
> >
>
> Yeah, something like this would work but some of the callers other
> than handle_streamed_transaction() also need to be changed. See
> attached.
>
> > FWIW I've noticed this after rebasing the sequence decoding patch, which
> > adds another type of protocol message with the transactional vs.
> > non-transactional behavior, similar to "logical messages" except that in
> > this case the worker does not ignore that.
> >
> > Also, I think get_transaction_apply_action() would deserve better
> > comments explaining how/why it makes the decisions.
> >
>
> Okay, I have added the comments in get_transaction_apply_action() and
> updated the comments to refer to the enum TransApplyAction where all
> the actions are explained.

Thank you for the patch.

@@ -1710,6 +1712,7 @@ apply_handle_stream_stop(StringInfo s)
}

in_streamed_transaction = false;
+   stream_xid = InvalidTransactionId;

We reset stream_xid also in stream_close_file() but probably it's no
longer necessary?

How about adding an assertion in apply_handle_stream_start() to make
sure the stream_xid is invalid?

---
It's not related to this issue but I realized that if the action
returned by get_transaction_apply_action() is not handled in the
switch statement, we do only Assert(false). Is it better to raise an
error like "unexpected apply action %d" just in case in order to
detect failure cases also in the production environment?


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

typo in the subscription command tests

2023-01-16 Thread Takamichi Osumi (Fujitsu)

Hi,

While working on some logical replication patch,
I've find a typo on HEAD.
Attached the modification patch for this.


Best Regards,
Takamichi Osumi



v1-0001-Fix-a-typo-in-subscription-command-tests.patch
Description: v1-0001-Fix-a-typo-in-subscription-command-tests.patch

Re: Extracting cross-version-upgrade knowledge from buildfarm client

I've pushed the per-branch AdjustUpgrade.pm files and tested by performing
a fresh round of buildfarm runs with the patched TestUpgradeXversion.pm
file.  I think we're in good shape with this project.

I dunno if we want to stretch buildfarm owners' patience with yet
another BF client release right now.  On the other hand, I'm antsy
to see if we can un-revert 1b4d280ea after doing a little more
work in AdjustUpgrade.pm.

regards, tom lane

Re: New strategies for freezing, advancing relfrozenxid early

On Mon, Jan 16, 2023 at 10:00 AM Peter Geoghegan  wrote:
> Now we treat the scanning and freezing strategies as two independent
> choices. Of course they're not independent in any practical sense, but
> I think it's slightly simpler and more elegant that way -- it makes
> the GUC vacuum_freeze_strategy_threshold strictly about freezing
> strategy, while still leading to VACUUM advancing relfrozenxid in a
> way that makes sense. It just happens as a second order effect. Why
> add a special case?

This might be a better way to explain it:

The main page-level freezing commit (commit 1de58df4) already added an
optimization that triggers page-level freezing "early" (early relative
to vacuum_freeze_min_age). This happens whenever a page already needs
to have an FPI logged inside lazy_scan_prune -- even when we're using
the lazy freezing strategy. The optimization isn't configurable, and
gets applied regardless of freezing strategy (technically there is no
such thing as freezing strategies on HEAD just yet, though HEAD still
has this optimization).

There will be workloads where the FPI optimization will result in
freezing many more pages -- especially when data checksums are in use
(since then we could easily need to log an FPI just so pruning can set
a hint bit). As a result, certain VACUUMs that use the lazy freezing
strategy will freeze in almost the same way as an equivalent VACUUM
using the eager freezing strategy. Such a "nominally lazy but actually
quite eager" VACUUM operation should get the same benefit in terms of
relfrozenxid advancement as it would if it really had used the eager
freezing strategy instead. It's fairly obvious that we'll get the same
benefit in relfrozenxid advancement (comparable relfrozenxid results
for comparable freezing work), since the way that VACUUM decides on
its scanning strategy is not conditioned on freezing strategy (whether
by the ongoing VACUUM or any other VACUUM against the same table).

All that matters is the conditions in the table (in particular the
added cost of opting for eager scanning over lazy scanning) as
indicated by the visibility map at the start of each VACUUM -- how
those conditions came about really isn't interesting at that point.
And so lazy_scan_strategy doesn't care about them when it chooses
VACUUM's scanning strategy.

There are even tables/workloads where relfrozenxid will be able to
jump forward by a huge amount whenever VACUUM choosing the eager
scanning strategy, despite the fact that VACUUM generally does little
or no freezing to make that possible:

https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch_3

--
Peter Geoghegan

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread houzj.f...@fujitsu.com

On Tuesday, January 17, 2023 5:43 AM Peter Smith  wrote:
> 
> On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila 
> wrote:
> >
> > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith 
> wrote:
> > >
> > > 2.
> > >
> > >  /*
> > > + * Return the pid of the leader apply worker if the given pid is
> > > +the pid of a
> > > + * parallel apply worker, otherwise return InvalidPid.
> > > + */
> > > +pid_t
> > > +GetLeaderApplyWorkerPid(pid_t pid)
> > > +{
> > > + int leader_pid = InvalidPid;
> > > + int i;
> > > +
> > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > +
> > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > +
> > > + if (isParallelApplyWorker(w) && w->proc && pid == w->proc->pid) {
> > > + leader_pid = w->leader_pid; break; } }
> > > +
> > > + LWLockRelease(LogicalRepWorkerLock);
> > > +
> > > + return leader_pid;
> > > +}
> > >
> > > 2a.
> > > IIUC the IsParallelApplyWorker macro does nothing except check that
> > > the leader_pid is not InvalidPid anyway, so AFAIK this algorithm
> > > does not benefit from using this macro because we will want to
> > > return InvalidPid anyway if the given pid matches.
> > >
> > > So the inner condition can just say:
> > >
> > > if (w->proc && w->proc->pid == pid)
> > > {
> > > leader_pid = w->leader_pid;
> > > break;
> > > }
> > >
> >
> > Yeah, this should also work but I feel the current one is explicit and
> > more clear.
> 
> OK.
> 
> But, I have one last comment about this function -- I saw there are already
> other functions that iterate max_logical_replication_workers like this looking
> for things:
> - logicalrep_worker_find
> - logicalrep_workers_find
> - logicalrep_worker_launch
> - logicalrep_sync_worker_count
> 
> So I felt this new function (currently called GetLeaderApplyWorkerPid) ought
> to be named similarly to those ones. e.g. call it something like
> "logicalrep_worker_find_pa_leader_pid".
> 

I am not sure we can use the name, because currently all the API name in 
launcher that
used by other module(not related to subscription) are like
AxxBxx style(see the functions in logicallauncher.h).
logicalrep_worker_xxx style functions are currently only declared in
worker_internal.h.

Best regards,
Hou zj

Re: almost-super-user problems that we haven't fixed yet

On Mon, Jan 16, 2023 at 5:37 PM Nathan Bossart  wrote:
> On Mon, Jan 16, 2023 at 02:29:56PM -0500, Robert Haas wrote:
> > 4. You can reserve a small number of connections for the superuser
> > with superuser_reserved_connections, but there's no way to do a
> > similar thing for any other user. As mentioned above, a CREATEROLE
> > user could set connection limits for every created role such that the
> > sum of those limits is less than max_connections by some margin, but
> > that restricts each of those roles individually, not all of them in
> > the aggregate. Maybe we could address this by inventing a new GUC
> > reserved_connections and a predefined role
> > pg_use_reserved_connections.
>
> I've written something like this before, and I'd be happy to put together a
> patch if there is interest.

Cool. I had been thinking of coding it up myself, but you doing it works, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New strategies for freezing, advancing relfrozenxid early

On Mon, Jan 16, 2023 at 10:10 AM Peter Geoghegan wrote:
> Attached is v16, which incorporates some of Matthias' feedback.

0001 (the freezing strategies patch) is now committable IMV. Or at
least will be once I polish the docs a bit more. I plan on committing
0001 some time next week, barring any objections.

I should point out that 0001 is far shorter and simpler than the
page-level freezing commit that already went in (commit 1de58df4). The
only thing in 0001 that seems like it might be a bit controversial
(when considered on its own) is the addition of the
vacuum_freeze_strategy_threshold GUC/reloption. Note in particular
that vacuum_freeze_strategy_threshold doesn't look like any other
existing GUC; it gets applied as a threshold on the size of the rel's
main fork at the beginning of vacuumlazy.c processing. As far as I
know there are no objections to that approach at this time, but it
does still seem worth drawing attention to now.

0001 also makes unlogged tables and temp tables always use eager
freezing strategy, no matter how the GUC/reloption are set. This seems
*very* easy to justify, since the potential downside of such a policy
is obviously extremely low, even when we make very pessimistic
assumptions. The usual cost we need to worry about when it comes to
freezing is the added WAL overhead -- that clearly won't apply when
we're vacuuming non-permanent tables. That really just leaves the cost
of dirtying extra pages, which in general could have a noticeable
system-level impact in the case of unlogged tables.

Dirtying extra pages when vacuuming an unlogged table is also a
non-issue. Even the eager freezing strategy only freezes "extra" pages
("extra" relative to the lazy strategy behavior) given a page that
will be set all-visible in any case [1]. Such a page will need to have
its page-level PD_ALL_VISIBLE bit set in any case -- which is already
enough to dirty the page. And so there can never be any additional
pages dirtied as a result of the special policy 0001 adds for
non-permanent relations.

[1]
https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch_2
--
Peter Geoghegan

Re: doc: add missing "id" attributes to extension packaging page

2023-01-16 Thread Karl O. Pinc

On Mon, 16 Jan 2023 11:14:35 -0600
"Karl O. Pinc"  wrote:

> On Sun, 15 Jan 2023 18:01:50 -0600
> "Karl O. Pinc"  wrote:
> 
> > Regards XSLT:
> > 
> > I believe the XSLT needs work.  

> In XSLT 1.0 there is no xml:default-mode.  So I _think_ what you do
> then is modify the built-in template rules so that the (default)
> template (mode='') is invoked when there is no 'postgres-mode'
> version of the template, but otherwise the 'postgres-mode' version of
> the template is invoked.  Your 'postgres-mode' templates will
> xsl:call-template the default template, adding whatever they want to
> the output produced by the default template.

Or maybe the right way is to set a mode at the very top,
the first apply-templates call, and not mess with the
built-in templates at all.  (You'd write your own
"postgres-mode" templates the same way, to "wrap"
and call the default templates.)

Think of the mode as an implicit argument that's preserved and
passed down through each template invocation without having to
be explicitly specified by the calling code.

Regards

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-01-16 Thread Andrew Dunstan



On 2023-01-16 Mo 18:11, Tom Lane wrote:
> I wrote:
>> I think we're about ready to go, except for cutting down
>> AdjustUpgrade.pm to make versions to put in the back branches.
> Hmmm ... so upon trying to test in the back branches, I soon
> discovered that PostgreSQL/Version.pm isn't there before v15.
>
> I don't see a good reason why we couldn't back-patch it, though.
> Any objections?
>
>   


No, that seems perfectly reasonable.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2023-01-16 Thread Dean Rasheed

On Wed, 11 Jan 2023 at 05:24, David Rowley  wrote:
>
> I'm wondering if 1349d279 should have just never opted to presort
> Aggrefs which have volatile functions so that the existing behaviour
> of unordered output is given always and nobody is fooled into thinking
> this works correctly only to be disappointed later when they add some
> other aggregate to their query or if we should fix both.  Certainly,
> it seems much easier to do the former.
>

I took a look at this, and I agree that the best solution is probably
to have make_pathkeys_for_groupagg() ignore Aggrefs that contain
volatile functions. Not only is that the simplest solution, preserving
the old behaviour, I think it's required for correctness.

Aside from the fact that I don't think such aggregates would benefit
from the optimisation introduced by 1349d279, I think it would be
incorrect if there was more than one such aggregate having the same
sort expression, because I think that volatile sorting should be
evaluated separately for each aggregate. For example:

SELECT string_agg(a::text, ',' ORDER BY random()),
   string_agg(a::text, ',' ORDER BY random())
FROM generate_series(1,3) s(a);

 string_agg | string_agg
+
 2,1,3  | 3,2,1
(1 row)

so pre-sorting wouldn't be right (or at least it would change existing
behaviour in a surprising way).

Regards,
Dean

Re: Extracting cross-version-upgrade knowledge from buildfarm client

I wrote:
> I think we're about ready to go, except for cutting down
> AdjustUpgrade.pm to make versions to put in the back branches.

Hmmm ... so upon trying to test in the back branches, I soon
discovered that PostgreSQL/Version.pm isn't there before v15.

I don't see a good reason why we couldn't back-patch it, though.
Any objections?

regards, tom lane

Re: Refactor recordExtObjInitPriv()

2023-01-16 Thread Nathan Bossart

On Mon, Jan 16, 2023 at 12:01:47PM +0100, Peter Eisentraut wrote:
> I have updated the patch as you suggested and split out the aggregate issue
> into a separate patch for clarity.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

2023-01-16 Thread Nathan Bossart

On Mon, Jan 16, 2023 at 04:36:01PM +0900, Michael Paquier wrote:
> Once this issue was fixed, nothing else stood out, so applied this
> part.

Thanks!  I've attached a rebased version of the rest of the patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From c2e6eb1251b47743c50717cad9adc49ccd7249d5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 23 Dec 2022 16:53:38 -0800
Subject: [PATCH v7 1/2] Refactor code for restoring files via shell.

Presently, restore_command uses a different code path than
archive_cleanup_command and recovery_end_command.  These code paths
are similar and can be easily combined.
---
 src/backend/access/transam/shell_restore.c | 90 ++
 1 file changed, 39 insertions(+), 51 deletions(-)

diff --git a/src/backend/access/transam/shell_restore.c b/src/backend/access/transam/shell_restore.c
index 7753a7d667..f5b6cf174e 100644
--- a/src/backend/access/transam/shell_restore.c
+++ b/src/backend/access/transam/shell_restore.c
@@ -25,11 +25,10 @@
 #include "storage/ipc.h"
 #include "utils/wait_event.h"
 
-static void ExecuteRecoveryCommand(const char *command,
+static bool ExecuteRecoveryCommand(const char *command,
    const char *commandName,
-   bool failOnSignal,
-   uint32 wait_event_info,
-   const char *lastRestartPointFileName);
+   bool failOnSignal, bool exitOnSigterm,
+   uint32 wait_event_info, int fail_elevel);
 
 /*
  * Attempt to execute a shell-based restore command.
@@ -41,25 +40,12 @@ shell_restore(const char *file, const char *path,
 			  const char *lastRestartPointFileName)
 {
 	char	   *cmd;
-	int			rc;
+	bool		ret;
 
 	/* Build the restore command to execute */
 	cmd = BuildRestoreCommand(recoveryRestoreCommand, path, file,
 			  lastRestartPointFileName);
 
-	ereport(DEBUG3,
-			(errmsg_internal("executing restore command \"%s\"", cmd)));
-
-	/*
-	 * Copy xlog from archival storage to XLOGDIR
-	 */
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
-	rc = system(cmd);
-	pgstat_report_wait_end();
-
-	pfree(cmd);
-
 	/*
 	 * Remember, we rollforward UNTIL the restore fails so failure here is
 	 * just part of the process... that makes it difficult to determine
@@ -84,17 +70,11 @@ shell_restore(const char *file, const char *path,
 	 *
 	 * We treat hard shell errors such as "command not found" as fatal, too.
 	 */
-	if (rc != 0)
-	{
-		if (wait_result_is_signal(rc, SIGTERM))
-			proc_exit(1);
-
-		ereport(wait_result_is_any_signal(rc, true) ? FATAL : DEBUG2,
-(errmsg("could not restore file \"%s\" from archive: %s",
-		file, wait_result_to_str(rc;
-	}
+	ret = ExecuteRecoveryCommand(cmd, "restore_command", true, true,
+ WAIT_EVENT_RESTORE_COMMAND, DEBUG2);
+	pfree(cmd);
 
-	return (rc == 0);
+	return ret;
 }
 
 /*
@@ -103,9 +83,14 @@ shell_restore(const char *file, const char *path,
 void
 shell_archive_cleanup(const char *lastRestartPointFileName)
 {
-	ExecuteRecoveryCommand(archiveCleanupCommand, "archive_cleanup_command",
-		   false, WAIT_EVENT_ARCHIVE_CLEANUP_COMMAND,
-		   lastRestartPointFileName);
+	char	   *cmd;
+
+	cmd = replace_percent_placeholders(archiveCleanupCommand,
+	   "archive_cleanup_command",
+	   "r", lastRestartPointFileName);
+	(void) ExecuteRecoveryCommand(cmd, "archive_cleanup_command", false, false,
+  WAIT_EVENT_ARCHIVE_CLEANUP_COMMAND, WARNING);
+	pfree(cmd);
 }
 
 /*
@@ -114,9 +99,14 @@ shell_archive_cleanup(const char *lastRestartPointFileName)
 void
 shell_recovery_end(const char *lastRestartPointFileName)
 {
-	ExecuteRecoveryCommand(recoveryEndCommand, "recovery_end_command", true,
-		   WAIT_EVENT_RECOVERY_END_COMMAND,
-		   lastRestartPointFileName);
+	char	   *cmd;
+
+	cmd = replace_percent_placeholders(recoveryEndCommand,
+	   "recovery_end_command",
+	   "r", lastRestartPointFileName);
+	(void) ExecuteRecoveryCommand(cmd, "recovery_end_command", true, false,
+  WAIT_EVENT_RECOVERY_END_COMMAND, WARNING);
+	pfree(cmd);
 }
 
 /*
@@ -124,27 +114,22 @@ shell_recovery_end(const char *lastRestartPointFileName)
  *
  * 'command' is the shell command to be executed, 'commandName' is a
  * human-readable name describing the command emitted in the logs. If
- * 'failOnSignal' is true and the command is killed by a signal, a FATAL
- * error is thrown. Otherwise a WARNING is emitted.
+ * 'failOnSignal' is true and the command is killed by a signal, a FATAL error
+ * is thrown. Otherwise, 'fail_elevel' is used for the log message.  If
+ * 'exitOnSigterm' is true and the command is killed by SIGTERM, we exit
+ * immediately.
  *
- * This is currently used for recovery_end_command and archive_cleanup_command.
+ * Returns whether the command succeeded.
  */
-static void
+static bool
 ExecuteRecoveryCommand(const char *command, const char *commandName,
-	   bool failOnSignal, uint32 wait_event_info,
-	   const char *lastRestartPointFileNa

Re: almost-super-user problems that we haven't fixed yet

2023-01-16 Thread Nathan Bossart

On Mon, Jan 16, 2023 at 02:29:56PM -0500, Robert Haas wrote:
> 4. You can reserve a small number of connections for the superuser
> with superuser_reserved_connections, but there's no way to do a
> similar thing for any other user. As mentioned above, a CREATEROLE
> user could set connection limits for every created role such that the
> sum of those limits is less than max_connections by some margin, but
> that restricts each of those roles individually, not all of them in
> the aggregate. Maybe we could address this by inventing a new GUC
> reserved_connections and a predefined role
> pg_use_reserved_connections.

I've written something like this before, and I'd be happy to put together a
patch if there is interest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: BF animal malleefowl reported an failure in 001_password.pl

2023-01-16 Thread Thomas Munro

On Sun, Jan 15, 2023 at 12:35 AM Thomas Munro  wrote:
> Here's a sketch of the first idea.

To hit the problem case, the signal needs to arrive in between the
latch->is_set check and the epoll_wait() call, and the handler needs
to take a while to get started.  (If it arrives before the
latch->is_set check we report WL_LATCH_SET immediately, and if it
arrives after the epoll_wait() call begins, we get EINTR and go back
around to the latch->is_set check.)  With some carefully placed sleeps
to simulate a CPU-starved system (see attached) I managed to get a
kill-then-connect sequence to produce:

2023-01-17 10:48:32.508 NZDT [555849] LOG:  nevents = 2
2023-01-17 10:48:32.508 NZDT [555849] LOG:  events[0] = WL_SOCKET_ACCEPT
2023-01-17 10:48:32.508 NZDT [555849] LOG:  events[1] = WL_LATCH_SET
2023-01-17 10:48:32.508 NZDT [555849] LOG:  received SIGHUP, reloading
configuration files

With the patch I posted, we process that in the order we want:

2023-01-17 11:06:31.340 NZDT [562262] LOG:  nevents = 2
2023-01-17 11:06:31.340 NZDT [562262] LOG:  events[1] = WL_LATCH_SET
2023-01-17 11:06:31.340 NZDT [562262] LOG:  received SIGHUP, reloading
configuration files
2023-01-17 11:06:31.344 NZDT [562262] LOG:  events[0] = WL_SOCKET_ACCEPT

Other thoughts:

Another idea would be to teach the latch infrastructure itself to
magically swap latch events to position 0.  Latches are usually
prioritised; it's only in this rare race case that they are not.

Or going the other way, I realise that we're lacking a "wait for
reload" mechanism as discussed in other threads (usually people want
this if they care about its effects on backends other than the
postmaster, where all bets are off and Andres once suggested the
ProcSignalBarrier hammer), and if we ever got something like that it
might be another solution to this particular problem.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..f5a310f844 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1732,6 +1732,8 @@ ServerLoop(void)
    lengthof(events),
    0 /* postmaster posts no wait_events */ );
 
+		elog(LOG, "nevents = %d", nevents);
+
 		/*
 		 * Latch set by signal handler, or new connection pending on any of
 		 * our sockets? If the latter, fork a child process to deal with it.
@@ -1740,6 +1742,7 @@ ServerLoop(void)
 		{
 			if (events[i].events & WL_LATCH_SET)
 			{
+elog(LOG, "events[%i] = WL_LATCH_SET", i);
 ResetLatch(MyLatch);
 
 /* Process work requested via signal handlers. */
@@ -1756,6 +1759,7 @@ ServerLoop(void)
 			{
 Port	   *port;
 
+elog(LOG, "events[%i] = WL_SOCKET_ACCEPT", i);
 port = ConnCreate(events[i].fd);
 if (port)
 {
@@ -2679,6 +2683,8 @@ handle_pm_reload_request_signal(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pg_usleep(1);
+
 	pending_pm_reload_request = true;
 	SetLatch(MyLatch);
 
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index d79d71a851..99b0aade1c 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1465,6 +1465,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 			break;
 		}
 
+		if (set->latch)
+			pg_usleep(100);
+
 		/*
 		 * Wait for events using the readiness primitive chosen at the top of
 		 * this file. If -1 is returned, a timeout has occurred, if 0 we have

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

On Fri, Jan 13, 2023 at 9:55 PM Andres Freund  wrote:
> I think part of our difference around a high autovacuum_freeze_max_age are due
> to things you're trying to address here - if no-auto-cancel is a separate
> threshold from autovacuum_freeze_max_age, it's less problematic to set
> autovacuum_freeze_max_age to something lower.

That seems like a much smaller difference of opinion than I'd imagined
it was before now.

> But yes, there's a remaining difference of opinion / experience.

Perhaps it's also a communication issue. I don't disagree with
pragmatic decisions that made sense given very particular limitations
in Postgres, which this is starting to sound like now.

When I express skepticism of very high autovacuum_freeze_max_age
settings, it's mostly just that I don't think that age(relfrozenxid)
is at all informative in the way that something that triggers
autovacuum ought to be. Worst of all, the older relfrozenxid gets, the
less informative it becomes. I'm sure that it can be safe to use very
high autovacuum_freeze_max_age values, but it seems like a case of
using a very complicated and unreliable thing to decide when to
VACUUM.

> > It might make sense to always give a small fixed amount of headroom
> > when autovacuum_freeze_max_age is set to very high values. Maybe just
> > 5 million XIDs/MXIDs. That would probably be just as effective as
> > (say) 500 million in almost all cases. But then you have to accept
> > another magic number.
>
> I suspect that most systems with a high autovacuum_freeze_max_age use it
> because 200m leads to too frequent autovacuums - a few million won't do much
> for those.

My point was mostly that it really doesn't cost us anything, and it
could easily help, so it might be worthwhile.

I wonder if these users specifically get too many aggressive
autovacuums with lower autovacuum_freeze_max_age settings? Maybe they
even fail to get enough non-aggressive autovacuums? If that's what it
is, then that makes perfect sense to me. However, I tend to think of
this as an argument against aggressive VACUUMs, not an argument in
favor of high autovacuum_freeze_max_age settings (as I said,
communication is hard).

> How about a float autovacuum_no_auto_cancel_age where positive values are
> treated as absolute values, and negative values are a multiple of
> autovacuum_freeze_max_age? And where the "computed" age is capped at
> vacuum_failsafe_age? A "failsafe" autovacuum clearly shouldn't be cancelled.
>
> And maybe a default setting of -1.8 or so?

I think that would work fine, as a GUC (with no reloption).

> Well, historically there wasn't all that much protection. And I suspect there
> still might be some dragons. We really need to get rid of the remaining places
> that cache 32bit xids across transactions.

Maybe, but our position is that it's supported, it's safe -- there
will be no data loss (or else it's a bug, and one that we'd take very
seriously at that). Obviously it's a bad idea to allow this to happen,
but surely having the system enter xidStopLimit is sufficient
disincentive for users.

> I've seen a ~2TB table grow to ~20TB due to dead tuples, at which point the
> server crash-restarted due to WAL ENOSPC. I think in that case there wasn't
> even DDL, they just needed to make the table readonly, for a brief moment, a
> few times a day. The problem started once the indexes grew to be large enough
> that the time for (re-)finding dead tuples + and an index scan phase got large
> enough that they were unlucky to be killed a few times in a row. After that
> autovac never got through the index scan phase. Not doing any "new" work, just
> collecting the same dead tids over and over, scanning the indexes for those
> tids, never quite finishing.

That makes sense to me, though I wonder if there would have been
another kind of outage (not the one you actually saw) had the
autocancellation behavior somehow been disabled after the first few
cancellations.

This sounds like a great argument in favor of suspend-and-resume as a
way of handling autocancellation -- no useful work needs to be thrown
away for AV to yield for a minute or two. One ambition of mine for the
visibility map snapshot infrastructure was to be able support
suspend-and-resume. It wouldn't be that hard for autovacuum to
serialize everything, and use that to pick up right where an
autovacuum worker left of at when it was autocancelled. Same
OldestXmin starting point as before, same dead_items array, same
number of scanned_pages (and pages left to scan).

> > What you call aspect 2 (the issue with disastrous HW lock traffic jams
> > involving TRUNCATE being run from a cron job, etc) is a big goal of
> > mine for this patch series. You seem unsure of how effective my
> > approach (or an equally simple approach based on table age heuristics)
> > will be. Is that the case?
>
> I am was primarily concerned about situations where an admin interactively was
> trying to get rid of the table holding back the xid horizon

Re: Extracting cross-version-upgrade knowledge from buildfarm client

OK, here's a v4:

* It works with 002_pg_upgrade.pl now.  The only substantive change
I had to make for that was to define the $old_version arguments as
being always PostgreSQL::Version objects not strings, because
otherwise I got complaints like

Argument "HEAD" isn't numeric in numeric comparison (<=>) at 
/home/postgres/pgsql/src/bin/pg_upgrade/../../../src/test/perl/PostgreSQL/Version.pm
 line 130.

So now TestUpgradeXversion.pm is responsible for performing that
conversion, and also for not doing any conversions on HEAD (which
Andrew wanted anyway).

* I improved pg_upgrade's TESTING directions after figuring out how
to get it to work for contrib modules.

* Incorporated (most of) Andrew's stylistic improvements.

* Simplified TestUpgradeXversion.pm's use of diff, as discussed.

I think we're about ready to go, except for cutting down
AdjustUpgrade.pm to make versions to put in the back branches.

I'm slightly tempted to back-patch 002_pg_upgrade.pl so that there
is an in-tree way to verify back-branch AdjustUpgrade.pm files.
On the other hand, it's hard to believe that testing that in
HEAD won't be sufficient; I doubt the back-branch copies will
need to change much.

regards, tom lane

diff --git a/src/bin/pg_upgrade/TESTING b/src/bin/pg_upgrade/TESTING
index 98286231d7..81a4324a76 100644
--- a/src/bin/pg_upgrade/TESTING
+++ b/src/bin/pg_upgrade/TESTING
@@ -10,31 +10,14 @@ This will run the TAP tests to run pg_upgrade, performing an upgrade
 from the version in this source tree to a new instance of the same
 version.
 
-Testing an upgrade from a different version requires a dump to set up
-the contents of this instance, with its set of binaries.  The following
-variables are available to control the test (see DETAILS below about
-the creation of the dump):
+Testing an upgrade from a different PG version is also possible, and
+provides a more thorough test that pg_upgrade does what it's meant for.
+This requires both a source tree and an installed tree for the old
+version, as well as a dump file to set up the instance to be upgraded.
+The following environment variables must be set to enable this testing:
 export olddump=...somewhere/dump.sql	(old version's dump)
 export oldinstall=...otherversion/	(old version's install base path)
-
-"filter_rules" is a variable that can be used to specify a file with custom
-filtering rules applied before comparing the dumps of the PostgreSQL
-instances near the end of the tests, in the shape of regular expressions
-valid for perl.  This is useful to enforce certain validation cases where
-pg_dump could create inconsistent outputs across major versions.
-For example:
-
-	# Remove all CREATE POLICY statements
-	s/^CREATE\sPOLICY.*//mgx
-	# Replace REFRESH with DROP for materialized views
-	s/^REFRESH\s(MATERIALIZED\sVIEW)/DROP $1/mgx
-
-Lines beginning with '#' and empty lines are ignored.  One rule can be
-defined per line.
-
-Finally, the tests can be done by running
-
-	make check
+See DETAILS below for more information about creation of the dump.
 
 You can also test the different transfer modes (--copy, --link,
 --clone) by setting the environment variable PG_TEST_PG_UPGRADE_MODE
@@ -52,22 +35,32 @@ The most effective way to test pg_upgrade, aside from testing on user
 data, is by upgrading the PostgreSQL regression database.
 
 This testing process first requires the creation of a valid regression
-database dump that can be then used for $olddump.  Such files contain
+database dump that can then be used for $olddump.  Such files contain
 most database features and are specific to each major version of Postgres.
 
 Here are the steps needed to create a dump file:
 
 1)  Create and populate the regression database in the old cluster.
 This database can be created by running 'make installcheck' from
-src/test/regress using its source code tree.
+src/test/regress in the old version's source code tree.
 
-2)  Use pg_dumpall to dump out the contents of the instance, including the
-regression database, in the shape of a SQL file.  This requires the *old*
-cluster's pg_dumpall so as the dump created is compatible with the
-version of the cluster it is dumped into.
+If you like, you can also populate regression databases for one or
+more contrib modules by running 'make installcheck USE_MODULE_DB=1'
+in their directories.  (USE_MODULE_DB is essential so that the
+pg_upgrade test script will understand which database is which.)
 
-Once the dump is created, it can be repeatedly used with $olddump and
-`make check`, that automates the dump of the old database, its upgrade,
-the dump out of the new database and the comparison of the dumps between
-the old and new databases.  The contents of the dumps can also be manually
-compared.
+2)  Use pg_dumpall to dump out the contents of the instance, including the
+regression database(s), into a SQL file.  Use the *old* version's
+pg_dumpall so that the dump c

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Peter Smith

On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila  wrote:
>
> On Mon, Jan 16, 2023 at 10:24 AM Peter Smith  wrote:
> >
> > 2.
> >
> >  /*
> > + * Return the pid of the leader apply worker if the given pid is the pid 
> > of a
> > + * parallel apply worker, otherwise return InvalidPid.
> > + */
> > +pid_t
> > +GetLeaderApplyWorkerPid(pid_t pid)
> > +{
> > + int leader_pid = InvalidPid;
> > + int i;
> > +
> > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > +
> > + for (i = 0; i < max_logical_replication_workers; i++)
> > + {
> > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > +
> > + if (isParallelApplyWorker(w) && w->proc && pid == w->proc->pid)
> > + {
> > + leader_pid = w->leader_pid;
> > + break;
> > + }
> > + }
> > +
> > + LWLockRelease(LogicalRepWorkerLock);
> > +
> > + return leader_pid;
> > +}
> >
> > 2a.
> > IIUC the IsParallelApplyWorker macro does nothing except check that
> > the leader_pid is not InvalidPid anyway, so AFAIK this algorithm does
> > not benefit from using this macro because we will want to return
> > InvalidPid anyway if the given pid matches.
> >
> > So the inner condition can just say:
> >
> > if (w->proc && w->proc->pid == pid)
> > {
> > leader_pid = w->leader_pid;
> > break;
> > }
> >
>
> Yeah, this should also work but I feel the current one is explicit and
> more clear.

OK.

But, I have one last comment about this function -- I saw there are
already other functions that iterate max_logical_replication_workers
like this looking for things:
- logicalrep_worker_find
- logicalrep_workers_find
- logicalrep_worker_launch
- logicalrep_sync_worker_count

So I felt this new function (currently called GetLeaderApplyWorkerPid)
ought to be named similarly to those ones. e.g. call it something like
 "logicalrep_worker_find_pa_leader_pid".

>
> > ~
> >
> > 2b.
> > A possible alternative comment.
> >
> > BEFORE
> > Return the pid of the leader apply worker if the given pid is the pid
> > of a parallel apply worker, otherwise return InvalidPid.
> >
> >
> > AFTER
> > If the given pid has a leader apply worker then return the leader pid,
> > otherwise, return InvalidPid.
> >
>
> I don't think that is an improvement.
>
> > ==
> >
> > src/backend/utils/adt/pgstatfuncs.c
> >
> > 3.
> >
> > @@ -434,6 +435,16 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
> >   values[28] = Int32GetDatum(leader->pid);
> >   nulls[28] = false;
> >   }
> > + else
> > + {
> > + int leader_pid = GetLeaderApplyWorkerPid(beentry->st_procpid);
> > +
> > + if (leader_pid != InvalidPid)
> > + {
> > + values[28] = Int32GetDatum(leader_pid);
> > + nulls[28] = false;
> > + }
> > +
> >
> > 3a.
> > There is an existing comment preceding this if/else but it refers only
> > to leaders of parallel groups. Should that comment be updated to
> > mention the leader apply worker too?
> >
>
> Yeah, we can slightly adjust the comments. How about something like the below:
> index 415e711729..7eb668634a 100644
> --- a/src/backend/utils/adt/pgstatfuncs.c
> +++ b/src/backend/utils/adt/pgstatfuncs.c
> @@ -410,9 +410,9 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
>
> /*
>  * If a PGPROC entry was retrieved, display
> wait events and lock
> -* group leader information if any.  To avoid
> extra overhead, no
> -* extra lock is being held, so there is no guarantee 
> of
> -* consistency across multiple rows.
> +* group leader or apply leader information if
> any.  To avoid extra
> +* overhead, no extra lock is being held, so
> there is no guarantee
> +* of consistency across multiple rows.
>  */
> if (proc != NULL)
> {
> @@ -428,7 +428,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
> /*
>  * Show the leader only for active
> parallel workers.  This
>  * leaves the field as NULL for the
> leader of a parallel
> -* group.
> +* group or the leader of a parallel apply.
>  */
> if (leader && leader->pid !=
> beentry->st_procpid)
>

The updated comment LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-01-16 Thread Maciek Sakrejda

On Fri, Jan 13, 2023 at 10:38 AM Melanie Plageman
 wrote:
>
> Attached is v47.

I missed a couple of versions, but I think the docs are clearer now.
I'm torn on losing some of the detail, but overall I do think it's a
good trade-off. Moving some details out to after the table does keep
the bulk of the view documentation more readable, and the "inform
database tuning" part is great. I really like the idea of a separate
Interpreting Statistics section, but for now this works.

>+  vacuum: I/O operations performed outside of 
>shared
>+  buffers while vacuuming and analyzing permanent relations.

Why only permanent relations? Are temporary relations treated
differently? I imagine if someone has a temp-table-heavy workload that
requires regularly vacuuming and analyzing those relations, this point
may be confusing without some additional explanation.

Other than that, this looks great.

Thanks,
Maciek

"Measuring timing overhead" in docs seems misleading

2023-01-16 Thread Andres Freund

Hi,

I was trying to extract a commitable piece out of [1]. To be able to judge
changes in timing overhead more accurately, I thought it'd be sensible to
update pg_test_timing to report nanoseconds instead of microseconds. Which
lead to trying to update pg_test_timing's docs [2].

The "Measuring Executor Timing Overhead" section seems misleading:
  
   The i7-860 system measured runs the count query in 9.8 ms while
   the EXPLAIN ANALYZE version takes 16.6 ms, each
   processing just over 100,000 rows.  That 6.8 ms difference means the timing
   overhead per row is 68 ns, about twice what pg_test_timing estimated it
   would be.  Even that relatively small amount of overhead is making the fully
   timed count statement take almost 70% longer.  On more substantial queries,
   the timing overhead would be less problematic.
  

The main reason for the ~2x discrepancy is that we do 2 timestamp calls for
each invocation of an executor node, one in InstrStartNode(), one in
InstrStopNode(). I think this should be clarified in the section?


I also think we should update the section to compare
EXPLAIN (ANALYZE, TIMING OFF) with
EXPLAIN (ANALYZE, TIMING ON)
rather than comparing the "bare" statement with EXPLAIN ANALYZE. There's
plenty other overhead in EXPLAIN, even without TIMING ON.

With the instr_time-as-nanosec patches applied (I'll post a new version in a
few minutes), I get the following:

pg_test_timing:
Per loop time including overhead: 13.97 ns
Histogram of timing durations:
 < ns   % of total  count
   16 97.48221  209400569
   32  2.512015396022
   64  0.00477  10246
  128  0.00030640
  256  0.5117
  512  0.0  0
 1024  0.0  3
 2048  0.00034729
 4096  0.1 14
 8192  0.0  8
16384  0.00015320
32768  0.00014303
65536  0.1 26
   131072  0.0  0
   262144  0.0  1

psql -Xc 'DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT * FROM 
generate_series(1, 10) g(i);' && pgbench -n -r -t 100 -f <(echo -e "SELECT 
COUNT(*) FROM t;EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t;EXPLAIN 
(ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;") |grep '^ '
DROP TABLE
SELECT 10
 3.431   0  SELECT COUNT(*) FROM t;
 3.888   0  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM 
t;
 6.671   0  EXPLAIN (ANALYZE, TIMING ON) SELECT COUNT(*) FROM t;

Pgbench reports about 11% lost just from TIMING OFF ANALYZE, and a further 45%
from TIMING ON. The per-row overhead, compared between TIMING ON/OFF:

((6.187ms - 3.423 ms) * 100)/(10 * 2) = 13.82ns

which is within the run-to-run variance of the pg_test_timing result.

Greetings,

Andres Freund

[1] https://postgr.es/m/20230116023639.rn36vf6ajqmfciua%40awork3.anarazel.de
[2] https://www.postgresql.org/docs/current/pgtesttiming.html#id-1.9.5.11.7.3

Re: Extracting cross-version-upgrade knowledge from buildfarm client

Andrew Dunstan  writes:
> On 2023-01-16 Mo 14:34, Tom Lane wrote:
>> I don't think that's any easier to read, and it risks masking
>> diffs that we'd wish to know about.

> OK, I'll make another pass and tighten things up.

Don't sweat it, I'm just working the bugs out of a new version.
I'll have something to post shortly, I hope.

regards, tom lane

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-01-16 Thread Andrew Dunstan



On 2023-01-16 Mo 14:34, Tom Lane wrote:
> Andrew Dunstan  writes:
>> OK, here's my version. It tests clean against all of crake's dump files
>> back to 9.2.
>> To some extent it's a matter of taste, but I hate very long regex lines
>> - it makes it very hard to see what's actually changing, so I broke up
>> most of those.
> I don't mind breaking things up, but I'm not terribly excited about
> making the patterns looser, as you've done in some places like
>
>   if ($old_version < 14)
>   {
>   # Remove mentions of extended hash functions.
> - $dump =~
> -   s/^(\s+OPERATOR 1 =\(integer,integer\)) ,\n\s+FUNCTION 2 
> \(integer, integer\) public\.part_hashint4_noop\(integer,bigint\);/$1;/mg;
> - $dump =~
> -   s/^(\s+OPERATOR 1 =\(text,text\)) ,\n\s+FUNCTION 2 \(text, 
> text\) public\.part_hashtext_length\(text,bigint\);/$1;/mg;
> + $dump =~ s 
> {(^\s+OPERATOR\s1\s=\((?:integer,integer|text,text)\))\s,\n
> +\s+FUNCTION\s2\s.*?public.part_hash.*?;}
> +{$1;}mxg;
>   }
>
> I don't think that's any easier to read, and it risks masking
> diffs that we'd wish to know about.



OK, I'll make another pass and tighten things up.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-16 Thread Pavel Stehule

po 16. 1. 2023 v 21:34 odesílatel Tomas Vondra <
tomas.von...@enterprisedb.com> napsal:

> Hi,
>
> there's minor bitrot in the Mkvcbuild.pm change, making cfbot unhappy.
>
> As for the patch, I don't have much comments. I'm wondering if it'd be
> useful to indicate which timing source was actually used for EXPLAIN
> ANALYZE, say something like:
>
>  Planning time: 0.197 ms
>  Execution time: 0.225 ms
>  Timing source: clock_gettime (or tsc)
>
> There has been a proposal to expose this as a GUC (or perhaps as explain
> option), to allow users to pick what timing source to use. I wouldn't go
> that far - AFAICS is this is meant to be universally better when
> available. But knowing which source was used seems useful.
>

+1

Pavel


>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-16 Thread Tomas Vondra

Hi,

there's minor bitrot in the Mkvcbuild.pm change, making cfbot unhappy.

As for the patch, I don't have much comments. I'm wondering if it'd be
useful to indicate which timing source was actually used for EXPLAIN
ANALYZE, say something like:

 Planning time: 0.197 ms
 Execution time: 0.225 ms
 Timing source: clock_gettime (or tsc)

There has been a proposal to expose this as a GUC (or perhaps as explain
option), to allow users to pick what timing source to use. I wouldn't go
that far - AFAICS is this is meant to be universally better when
available. But knowing which source was used seems useful.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Extracting cross-version-upgrade knowledge from buildfarm client

Andrew Dunstan  writes:
> OK, here's my version. It tests clean against all of crake's dump files
> back to 9.2.
> To some extent it's a matter of taste, but I hate very long regex lines
> - it makes it very hard to see what's actually changing, so I broke up
> most of those.

I don't mind breaking things up, but I'm not terribly excited about
making the patterns looser, as you've done in some places like

if ($old_version < 14)
{
# Remove mentions of extended hash functions.
-   $dump =~
- s/^(\s+OPERATOR 1 =\(integer,integer\)) ,\n\s+FUNCTION 2 
\(integer, integer\) public\.part_hashint4_noop\(integer,bigint\);/$1;/mg;
-   $dump =~
- s/^(\s+OPERATOR 1 =\(text,text\)) ,\n\s+FUNCTION 2 \(text, 
text\) public\.part_hashtext_length\(text,bigint\);/$1;/mg;
+   $dump =~ s 
{(^\s+OPERATOR\s1\s=\((?:integer,integer|text,text)\))\s,\n
+\s+FUNCTION\s2\s.*?public.part_hash.*?;}
+  {$1;}mxg;
}

I don't think that's any easier to read, and it risks masking
diffs that we'd wish to know about.

regards, tom lane

almost-super-user problems that we haven't fixed yet

Due to cf5eb37c5ee0cc54c80d95c1695d7fca1f7c68cb,
e5b8a4c098ad6add39626a14475148872cd687e0, and prior commits touching
related code, it should now be possible to consider handing out
CREATEROLE as a reasonable alternative to handing out SUPERUSER. Prior
to cf5eb37c5ee0cc54c80d95c1695d7fca1f7c68cb, giving CREATEROLE meant
giving away control of pg_execute_server_programs and every other
built-in role, so it wasn't possible to give CREATEROLE to a user who
isn't completely trusted. Now, that should be OK. CREATEROLE users
will only gain control over roles they create (and any others that the
superuser grants to them). Furthermore, if you set
createrole_self_grant to 'inherit' or 'set, inherit', a CREATEROLE
user will automatically inherit the privileges of the users they
create, hopefully making them feel like they are almost a superuser
without letting them actually take over the world.

Not very surprisingly, those commits failed to solve every single
problem that anyone has ever thought about in this area.

Here is a probably-incomplete list of related problems that are so far unsolved:

1. It's still possible for a CREATEROLE user to hand out role
attributes that they don't possess. The new prohibitions in
cf5eb37c5ee0cc54c80d95c1695d7fca1f7c68cb prevent a CREATEROLE user
from handing out membership in a role on which they lack sufficient
permissions, but they don't prevent a CREATEROLE user who lacks
CREATEDB from creating a new user who does have CREATEDB. I think we
should subject the CREATEDB, REPLICATION, and BYPASSRLS attributes to
the same rule that we now use for role memberships: you've got to have
the property in order to give it to someone else. In the case of
CREATEDB, this would tighten the current rules, which allow you to
give out CREATEDB without having it. In the case of REPLICATION and
BYPASSRLS, this would liberalize the current rules: right now, a
CREATEROLE user cannot give REPLICATION or BYPASSRLS to another user
even if they possess those attributes.

This proposal doesn't address the CREATEROLE or CONNECTION LIMIT
properties. It seems possible to me that someone might want to set up
a CREATEROLE user who can't make more such users, and this proposal
doesn't manufacture any way of doing that. It also doesn't let you
constraint the ability of a CREATEROLE user to set a CONNECTION LIMIT
for some other user. I think that's OK. It might be nice to have ways
of imposing such restrictions at some point in the future, but it is
not very obvious what to do about such cases and, importantly, I don't
think there's any security impact from failing to address those cases.
If a CREATEROLE user without CREATEDB can create a new role that does
have CREATEDB, that's a privilege escalation. If they can hand out
CREATEROLE, that isn't: they already have it.

2. It's still impossible for a CREATEROLE user to execute CREATE
SUBSCRIPTION, so they can't get logical replication working. There was
a previous thread about fixing this at
https://www.postgresql.org/message-id/flat/9DFC88D3-1300-4DE8-ACBC-4CEF84399A53%40enterprisedb.com
and the corresponding CF entry is listed as committed, but
CreateSubscription() still requires superuser, so I think that maybe
that thread only got some of the preliminary permissions-check work
committed and the core problem is yet to be solved.

3. Only superusers can control event triggers. In the thread at
https://www.postgresql.org/message-id/flat/914FF898-5AC4-4E02-8A05-3876087007FB%40enterprisedb.com
it was proposed, based on an idea from Tom, to allow any user to
create event triggers but, approximately, to only have them fire for
code running as a user whose privileges the creator already has. I
don't recall the precise rule that was proposed and it might need
rethinking in view of 3d14e171e9e2236139e8976f3309a588bcc8683b, and I
think there was also some opposition to that proposal, so I'm not sure
what the way forward here is.

4. You can reserve a small number of connections for the superuser
with superuser_reserved_connections, but there's no way to do a
similar thing for any other user. As mentioned above, a CREATEROLE
user could set connection limits for every created role such that the
sum of those limits is less than max_connections by some margin, but
that restricts each of those roles individually, not all of them in
the aggregate. Maybe we could address this by inventing a new GUC
reserved_connections and a predefined role
pg_use_reserved_connections.

5. If you set createrole_self_grant = 'set, inherit' and make alice a
CREATEROLE user and she goes around and creates a bunch of other users
and they all run around and create a bunch of objects and then alice
tries to pg_dump the entire database, it will work ... provided that
there are no tables owned by any other user. If the superuser has
created any tables, or there's another CREATEROLE user wandering
around creating tables, or even a non-CREATEROLE user whose
permissions alice does not have, pg_dump will

Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-01-16 Thread Andrew Dunstan


On 2023-01-15 Su 18:37, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Those replacement lines are very difficult to read. I think use of
>> extended regexes and some multi-part replacements would help. I'll have
>> a go at that tomorrow.
> Yeah, after I wrote that code I remembered about \Q ... \E, which would
> eliminate the need for most of the backslashes and probably make things
> better that way.  I didn't get around to improving it yet though, so
> feel free to have a go.
>
>   


OK, here's my version. It tests clean against all of crake's dump files
back to 9.2.

To some extent it's a matter of taste, but I hate very long regex lines
- it makes it very hard to see what's actually changing, so I broke up
most of those.

Given that we are looking at newlines in some places I decided that
after all it was important to convert CRLF to LF.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
new file mode 100644
index 00..2134afd64e
--- /dev/null
+++ b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
@@ -0,0 +1,550 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+=pod
+
+=head1 NAME
+
+PostgreSQL::Test::AdjustUpgrade - helper module for cross-version upgrade tests
+
+=head1 SYNOPSIS
+
+  use PostgreSQL::Test::AdjustUpgrade;
+
+  # Build commands to adjust contents of old-version database before dumping
+  $statements = adjust_database_contents($old_version, %dbnames);
+
+  # Adjust contents of old pg_dumpall output file to match newer version
+  $dump = adjust_old_dumpfile($old_version, $dump);
+
+  # Adjust contents of new pg_dumpall output file to match older version
+  $dump = adjust_new_dumpfile($old_version, $dump);
+
+=head1 DESCRIPTION
+
+C encapsulates various hacks needed to
+compare the results of cross-version upgrade tests.
+
+=cut
+
+package PostgreSQL::Test::AdjustUpgrade;
+
+use strict;
+use warnings;
+
+use Exporter 'import';
+use PostgreSQL::Version;
+
+our @EXPORT = qw(
+  adjust_database_contents
+  adjust_old_dumpfile
+  adjust_new_dumpfile
+);
+
+=pod
+
+=head1 ROUTINES
+
+=over
+
+=item $statements = adjust_database_contents($old_version, %dbnames)
+
+Generate SQL commands to perform any changes to an old-version installation
+that are needed before we can pg_upgrade it into the current PostgreSQL
+version.
+
+Typically this involves dropping or adjusting no-longer-supported objects.
+
+Arguments:
+
+=over
+
+=item C: Branch we are upgrading from.  This can be a branch
+name such as 'HEAD' or 'REL_11_STABLE', but it can also be any string
+that PostgreSQL::Version accepts.
+
+=item C: Hash of database names present in the old installation.
+
+=back
+
+Returns a reference to a hash, wherein the keys are database names and the
+values are arrayrefs to lists of statements to be run in those databases.
+
+=cut
+
+sub adjust_database_contents
+{
+	my ($old_version, %dbnames) = @_;
+	my $result = {};
+
+	# nothing to do for non-cross-version tests
+	return $result if $old_version eq 'HEAD';
+
+	# convert branch name to numeric form
+	$old_version =~ s/REL_?(\d+(?:_\d+)?)_STABLE/$1/;
+	$old_version =~ s/_/./;
+	$old_version = PostgreSQL::Version->new($old_version);
+
+	# remove dbs of modules known to cause pg_upgrade to fail
+	# anything not builtin and incompatible should clean up its own db
+	foreach my $bad_module ('test_ddl_deparse', 'tsearch2')
+	{
+		if ($dbnames{"contrib_regression_$bad_module"})
+		{
+			_add_st($result, 'postgres',
+"drop database contrib_regression_$bad_module");
+			delete($dbnames{"contrib_regression_$bad_module"});
+		}
+	}
+
+	# avoid version number issues with test_ext7
+	if ($dbnames{contrib_regression_test_extensions})
+	{
+		_add_st(
+			$result,
+			'contrib_regression_test_extensions',
+			'drop extension if exists test_ext7');
+	}
+
+	# stuff not supported from release 16
+	if ($old_version >= 12 && $old_version < 16)
+	{
+		# Can't upgrade aclitem in user tables from pre 16 to 16+.
+		_add_st($result, 'regression',
+			'alter table public.tab_core_types drop column aclitem');
+		# Can't handle child tables with locally-generated columns.
+		_add_st(
+			$result, 'regression',
+			'drop table public.gtest_normal_child',
+			'drop table public.gtest_normal_child2');
+	}
+
+	# stuff not supported from release 14
+	if ($old_version < 14)
+	{
+		# postfix operators (some don't exist in very old versions)
+		_add_st(
+			$result,
+			'regression',
+			'drop operator #@# (bigint,NONE)',
+			'drop operator #%# (bigint,NONE)',
+			'drop operator if exists !=- (bigint,NONE)',
+			'drop operator if exists #@%# (bigint,NONE)');
+
+		# get rid of dblink's dependencies on regress.so
+		my $regrdb =
+		  $old_version le '9.4'
+		  ? 'contrib_regression'
+		  : 'contrib_regression_dblink';
+
+		if ($dbnames{$regrdb})
+		{
+			_add_st(
+$result, $regrdb,
+'drop function if exists public

Re: Record queryid when auto_explain.log_verbose is on

2023-01-16 Thread Justin Pryzby

On Mon, Jan 16, 2023 at 09:36:59PM +0900, torikoshia wrote:
> Attached patch makes auto_explain also print query identifiers.

This was asked during the initial patch; does your patch address the
issues here ?

https://www.postgresql.org/message-id/20200308142644.vlihk7djpwqjkp7w%40nol

-- 
Justin

Re: New strategies for freezing, advancing relfrozenxid early

On Sun, Jan 15, 2023 at 9:13 PM Dilip Kumar  wrote:
> I have looked into the patch set, I think 0001 looks good to me about
> 0002 I have a few questions, 0003 I haven't yet looked at

Thanks for taking a look.

> I think '(nextXID - cutoffs->relfrozenxid) / freeze_table_age' should
> be the actual fraction right?  What is the point of adding 0.5 to the
> divisor?  If there is a logical reason, maybe we can explain in the
> comments.

It's just a way of avoiding division by zero.

> While looking into the logic of 'lazy_scan_strategy', I think the idea
> looks very good but the only thing is that
> we have kept eager freeze and eager scan completely independent.
> Don't you think that if a table is chosen for an eager scan
> then we should force the eager freezing as well?

Earlier versions of the patch kind of worked that way.
lazy_scan_strategy would actually use twice the GUC setting to
determine scanning strategy. That approach could make our "transition
from lazy to eager strategies" involve an excessive amount of
"catch-up freezing" in the VACUUM operation that advanced relfrozenxid
for the first time, which you see an example of here:

https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch

Now we treat the scanning and freezing strategies as two independent
choices. Of course they're not independent in any practical sense, but
I think it's slightly simpler and more elegant that way -- it makes
the GUC vacuum_freeze_strategy_threshold strictly about freezing
strategy, while still leading to VACUUM advancing relfrozenxid in a
way that makes sense. It just happens as a second order effect. Why
add a special case?

In principle the break-even point for eager scanning strategy (i.e.
advancing relfrozenxid) is based on the added cost only under this
scheme. There is no reason for lazy_scan_strategy to care about what
happened in the past to make the eager scanning strategy look like a
good idea. Similarly, there isn't any practical reason why
lazy_scan_strategy  needs to anticipate what will happen in the near
future with freezing.

-- 
Peter Geoghegan

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-16 Thread Andres Freund

Hi,

On 2023-01-02 14:28:20 +0100, David Geier wrote:
> I also somewhat improved the accuracy of the cycles to milli- and
> microseconds conversion functions by having two more multipliers with higher
> precision. For microseconds we could also keep the computation integer-only.
> I'm wondering what to best do for seconds and milliseconds. I'm currently
> leaning towards just keeping it as is, because the durations measured and
> converted are usually long enough that precision shouldn't be a problem.

I'm doubtful this is worth the complexity it incurs. By the time we convert
out of the instr_time format, the times shouldn't be small enough that the
accuracy is affected much.

Looking around, most of the existing uses of INSTR_TIME_GET_MICROSEC()
actually accumulate themselves, and should instead keep things in the
instr_time format and convert later. We'd win more accuracy / speed that way.

I don't think the introduction of pg_time_usec_t was a great idea, but oh
well.

> Additionally, I initialized a few variables of type instr_time which
> otherwise resulted in warnings due to use of potentially uninitialized
> variables.

Unless we decide, as I suggested downthread, that we deprecate
INSTR_TIME_SET_ZERO(), that's unfortunately not the right fix. I've a similar
patch that adds all the necesarry INSTR_TIME_SET_ZERO() calls.

> What about renaming INSTR_TIME_GET_DOUBLE() to INSTR_TIME_GET_SECS() so that
> it's consistent with the _MILLISEC() and _MICROSEC() variants?

> The INSTR_TIME_GET_MICROSEC() returns a uint64 while the other variants
> return double. This seems error prone. What about renaming the function or
> also have the function return a double and cast where necessary at the call
> site?

I think those should be a separate discussion / patch.

Greetings,

Andres Freund

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Alexander Pyhalov


Nikita Malakhov писал 2023-01-16 20:12:

Hi,

Currently there is no error in this case, so additional thrown error
would require a new test.
Besides, throwing an error here does not make sense - it is just a
check for a vacuum
permission, I think the right way is to just skip a relation that is
not suitable for vacuum.
Any thoughts or objections?



No objections for not throwing an error.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: doc: add missing "id" attributes to extension packaging page

2023-01-16 Thread Karl O. Pinc

On Sun, 15 Jan 2023 18:01:50 -0600
"Karl O. Pinc"  wrote:

> Regards XSLT:
> 
> I believe the XSLT needs work.

> I believe that overriding the XSLT by copying the original and making
> modifications is the "wrong way" (TM).  I think that the right way is
> to declare a xsl:default-mode somewhere in the stylesheets.  There
> does not seem to be one, so the default mode for PG doc processing
> could be something like "postgres-mode".  And I'd expect it to be
> declared somewhere right at the top of the xml hierarchy.  (I forget
> what that is, probably something like "document" or "book" or
> something.)  Then, you'd write your for the  match="varlistentry"> and  with
> a mode of "postgres-mode", and have your templates call the
> "traditional default", "modeless", templates.  That way your not
> copying and patching upstream code, but are instead, in effect,
> calling it as a subroutine.
> 
> This should work especially well since, I think, you're just adding
> new output to what the upstream templates do.  You're not trying to
> insert new output into the middle of the stock output or otherwise
> modify the stock output.
> 
> You're adding only about 6 lines of XSLT to the upstream templates,
> and copying 100+ lines.  There must be a better way.
> 
> See: https://www.w3.org/TR/xslt-30/#modes
> 
> I've never tried this, although I do recall doing something or another
> with modes in the past.  And I've not gone so far as to figure out
> (again?) how to call a "modeless template", so you can invoke the
> original, upstream, templates.  And setting a default mode seems like
> something of a "big hammer", so should probably be checked over by
> someone who's more steeped in Postgres docs than myself.  (Like a
> committer. :) But I believe it is the way to go.  To make it work
> you'll need to figure out the XSLT mode selection process and make
> sure that it first selects the "postgres-mode", and then the modeless
> templates, and also still works right when a template calls another
> and explicitly sets a mode.

Drat.  I forgot.  We're using xsltproc which is XSLT 1.0.

So this is the relevant spec:

https://www.w3.org/TR/1999/REC-xslt-19991116#modes

In XSLT 1.0 there is no xml:default-mode.  So I _think_ what you do then
is modify the built-in template rules so that the (default) template
(mode='') is invoked when there is no 'postgres-mode' version of the
template, but otherwise the 'postgres-mode' version of the template
is invoked.  Your 'postgres-mode' templates will xsl:call-template
the default template, adding whatever they want to the output produced
by the default template.

See: https://www.w3.org/TR/1999/REC-xslt-19991116#built-in-rule

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Nikita Malakhov

Hi,

Currently there is no error in this case, so additional thrown error would
require a new test.
Besides, throwing an error here does not make sense - it is just a check
for a vacuum
permission, I think the right way is to just skip a relation that is not
suitable for vacuum.
Any thoughts or objections?

On Mon, Jan 16, 2023 at 7:46 PM Alexander Pyhalov 
wrote:

> Nikita Malakhov писал 2023-01-16 17:26:
> > Hi!
> >
> > Here's the patch that fixes this case, please check it out.
> > The patch adds vacuum_is_permitted_for_relation() check before adding
> > partition relation to the vacuum list, and if permission is denied the
> > relation
> > is not added, so it is not passed to vacuum_rel() and there are no try
> > to
> > acquire the lock.
> >
> > Cheers!
>
> Hi.
>
> The patch seems to solve the issue.
> Two minor questions I have:
> 1) should we error out if HeapTupleIsValid(part_tuple) is false?
> 2) comment "Check partition relations for vacuum permit" seems to be
> broken in some way.
>
> --
> Best regards,
> Alexander Pyhalov,
> Postgres Professional
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Make EXPLAIN generate a generic plan for a parameterized query

Jim Jones  writes:
> However, when GENERIC_PLAN is used combined with BUFFERS, the 'Buffers' 
> node is shown the first time the query executed in a session:

> psql (16devel)
> Type "help" for help.

> postgres=# \c db
> You are now connected to database "db" as user "postgres".
> db=# EXPLAIN (BUFFERS, GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
>     QUERY PLAN
> -
>   Index Only Scan using t_col_idx on t  (cost=0.42..4.44 rows=1 width=11)
>     Index Cond: (col = $1)
>   Planning:
>     Buffers: shared hit=62
> (4 rows)

> db=# EXPLAIN (BUFFERS, GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
>     QUERY PLAN
> -
>   Index Only Scan using t_col_idx on t  (cost=0.42..4.44 rows=1 width=11)
>     Index Cond: (col = $1)
> (2 rows)

That's not new to this patch, the same thing happens without it.
It's reflecting catalog accesses involved in loading per-session
caches (which, therefore, needn't be repeated the second time).

> Also, this new parameter seems only to work between parenthesis 
> `(GENERIC_PLAN)`:

> db=# EXPLAIN GENERIC_PLAN SELECT * FROM t WHERE col = $1;
> ERROR:  syntax error at or near "GENERIC_PLAN"
> LINE 1: EXPLAIN GENERIC_PLAN SELECT * FROM t WHERE col = $1;

That's true of all but the oldest EXPLAIN options.  We don't do that
anymore because new options would have to become grammar keywords
to support that.

> On a very personal note: wouldn't just GENERIC (without _PLAN) suffice? 
> Don't bother with it if you disagree :-)

FWIW, "GENERIC" would be too generic for my taste ;-).  But I agree
it's a judgement call.

regards, tom lane

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Alexander Pyhalov


Nikita Malakhov писал 2023-01-16 17:26:

Hi!

Here's the patch that fixes this case, please check it out.
The patch adds vacuum_is_permitted_for_relation() check before adding
partition relation to the vacuum list, and if permission is denied the
relation
is not added, so it is not passed to vacuum_rel() and there are no try
to
acquire the lock.

Cheers!


Hi.

The patch seems to solve the issue.
Two minor questions I have:
1) should we error out if HeapTupleIsValid(part_tuple) is false?
2) comment "Check partition relations for vacuum permit" seems to be 
broken in some way.


--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Logical replication timeout problem

2023-01-16 Thread Ashutosh Bapat

On Wed, Jan 11, 2023 at 4:11 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 9, 2023 at 13:04 PM Amit Kapila  wrote:
> >
>
> Thanks for your comments.
>
> > One more thing, I think it would be better to expose a new callback
> > API via reorder buffer as suggested previously [2] similar to other
> > reorder buffer APIs instead of directly using reorderbuffer API to
> > invoke plugin API.
>
> Yes, I agree. I think it would be better to add a new callback API on the 
> HEAD.
> So, I improved the fix approach:
> Introduce a new optional callback to update the process. This callback 
> function
> is invoked at the end inside the main loop of the function
> ReorderBufferProcessTXN() for each change. In this way, I think it seems that
> similar timeout problems could be avoided.

I am a bit worried about the indirections that the wrappers and hooks
create. Output plugins call OutputPluginUpdateProgress() in callbacks
but I don't see why  ReorderBufferProcessTXN() needs a callback to
call OutputPluginUpdateProgress. I don't think output plugins are
going to do anything special with that callback than just call
OutputPluginUpdateProgress. Every output plugin will need to implement
it and if they do not they will face the timeout problem. That would
be unnecessary. Instead ReorderBufferUpdateProgress() in your first
patch was more direct and readable. That way the fix works for any
output plugin. In fact, I am wondering whether we could have a call in
ReorderBufferProcessTxn() at the end of transaction
(commit/prepare/commit prepared/abort prepared) instead of the
corresponding output plugin callbacks calling
OutputPluginUpdateProgress().

-- 
Best Wishes,
Ashutosh Bapat

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread Tomas Vondra

Hi Amit,

Thanks for the patch, the changes seem reasonable to me and it does fix
the issue in the sequence decoding patch.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rethinking the implementation of ts_headline()

Alvaro Herrera  writes:
> I came across #17556 which contains a different test for this, and I'm
> not sure that this patch changes things completely for the better.

Thanks for looking at my patch.  However ...

> That is, once past the 5000 words of distance, it fails to find a good
> cover, but before that it returns an acceptable headline.  However,
> after your proposed patch, we get this:

>  ts_headline │ ts_headline 
> ─┼─
>  {ipsum} │ {ipsum}
> (1 fila)

I get this with the patch:

 ts_headline | ts_headline 
-+-
 {ipsum} ... {labor} | {ipsum} ... {labor}
(1 row)

which is what I'd expect, because it removes the artificial limit on
cover length that I added in 78e73e875.  So I'm wondering why you got a
different result.  Maybe something to do with locale?  I tried it in
C and en_US.utf8.

regards, tom lane

Re: Logical replication timeout problem

2023-01-16 Thread Ashutosh Bapat

On Mon, Jan 9, 2023 at 4:08 PM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat  
> wrote:
>
> I'm not sure if we need to add global variables or member variables for a
> cumulative count that is only used here. How would you feel if I add some
> comments when declaring this static variable?

I see WalSndUpdateProgress::sendTime is static already. So this seems
fine. A comment will help sure.

>
> > +
> > +if (!ctx->update_progress)
> > +return;
> > +
> > +Assert(!ctx->fast_forward);
> > +
> > +/* set output state */
> > +ctx->accept_writes = false;
> > +ctx->write_xid = txn->xid;
> > +ctx->write_location = change->lsn;
> > +ctx->end_xact = false;
> >
> > This patch reverts many of the changes of the previous commit which tried to
> > fix this issue i.e. 8df2374. end_xact was introduced by the same commit 
> > but
> > without much explanation of that in the commit message. Its only user,
> > WalSndUpdateProgress(), is probably making a wrong assumption as well.
> >
> >  * We don't have a mechanism to get the ack for any LSN other than end
> >  * xact LSN from the downstream. So, we track lag only for end of
> >  * transaction LSN.
> >
> > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr 
> > which is
> > sent downstream through a keep alive message. Downstream may
> > acknowledge this
> > LSN. So we do get ack for any LSN, not just commit LSN.
> >
> > So I propose removing end_xact as well.
>
> We didn't track the lag during a transaction because it could make the
> calculations of lag functionality inaccurate. If we track every lsn, it could
> fail to record important lsn information because of
> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
> Please see details in [1] and [2].

LagTrackerRead() interpolates to reduce the inaccuracy. I don't
understand why we need to track the end LSN only. But I don't think
that affects this fix. So I am fine if we want to leave end_xact
there.

-- 
Best Wishes,
Ashutosh Bapat

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

On Fri, Jan 13, 2023 at 9:09 PM Andres Freund  wrote:
> > > If I understand the patch correctly, we now have the following age based
> > > thresholds for av:
> > >
> > > - force-enable autovacuum:
> > >   oldest_datfrozenxid + autovacuum_freeze_max_age < nextXid
> > > - autovacuum based on age:
> > >   freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
> > > tableagevac = relfrozenxid < recentXid - freeze_max_age
> > > - prevent auto-cancellation:
> > >   freeze_max_age = Min(autovacuum_freeze_max_age, table_freeze_max_age)
> > >   prevent_auto_cancel_age = Min(freeze_max_age * 2, 1 billion)
> > >   prevent_auto_cancel = reflrozenxid < recentXid - prevent_auto_cancel_age
> > >
> > > Is that right?
> >
> > That summary looks accurate, but I'm a bit confused about why you're
> > asking the question this way. I thought that it was obvious that the
> > patch doesn't change most of these things.
>
> For me it was helpful to clearly list the triggers when thinking about the
> issue. I found the diff hard to read and, as noted above, the logic for the
> auto cancel threshold quite confusing, so ...

I really dislike formulas like Min(freeze_max_age * 2, 1 billion).
That looks completely magical from a user perspective. Some users
aren't going to understand autovacuum behavior at all. Some will, and
will be able to compare age(relfrozenxid) against
autovacuum_freeze_max_age. Very few people are going to think to
compare age(relfrozenxid) against some formula based on
autovacuum_freeze_max_age. I guess if we document it, maybe they will.

But even then, what's the logic behind that formula? I am not entirely
convinced that we need to separate the force-a-vacuum threshold from
the don't-cancel threshold, but if we do separate them, what's the
purpose of having the clearance between them increase as you increase
autovacuum_freeze_max_age from 0 to 500 million, and thereafter
decrease until it reaches 0 at 1 billion? I can't explain the logic
behind that except by saying "well, somebody came up with an arbitrary
formula".

I do like the idea of driving the auto-cancel behavior off of the
results of previous attempts to vacuum the table. That could be done
independently of the XID age of the table. If we've failed to vacuum
the table, say, 10 times, because we kept auto-cancelling, it's
probably appropriate to force the issue. It doesn't really matter
whether the autovacuum triggered because of bloat or because of XID
age. Letting either of those things get out of control is bad. What I
think happens fairly commonly right now is that the vacuums just keep
getting cancelled until the table's XID age gets too old, and then we
finally force the issue. But at that point a lot of harm has already
been done. In a frequently updated table, waiting 300 million XIDs to
stop cancelling the vacuum is basically condemning the user to have to
run VACUUM FULL. The table can easily be ten or a hundred times bigger
than it should be by that point.

And that's a big reason why I am skeptical about the patch as
proposed. It raises the threshold for auto-cancellation in cases where
it's sometimes already far too high.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Add new GUC createrole_self_grant.

On Mon, Jan 16, 2023 at 10:33 AM David G. Johnston
 wrote:
> I’m moving on as well.  Go with what you have.  I have my personal 
> understanding clarified at this point.  If the docs need more work people 
> will ask questions to help guide such work.

Yeah, I hope so.

It's becoming increasingly clear to me that we haven't put enough
effort into clarifying what I will broadly call "trust issues" in the
documentation. It's bad if you call untrusted code that runs as you,
and it's bad if code that runs as you gets called by untrusted people
for whose antics you are not sufficiently prepared, and there are a
lot of ways those things things can happen: direction function calls,
operators, triggers, row-level security, views, index or materialized
view rebuilds, etc. I think it would be good to have a general
treatment of those issues in the documentation written by a
security-conscious hacker or hackers who are really familiar both with
the behavior of the system and also able to make the security
consequences understandable to people who are not so deeply invested
in PostgreSQL. I don't want to do that on this thread, but to the
extent that you're arguing that the current treatment is inadequate,
I'm fully in agreement with that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: allowing for control over SET ROLE

On Fri, Jan 13, 2023 at 2:17 AM Noah Misch  wrote:
> Since the text is superfluous but not wrong, I won't insist.

OK, committed as I had it, then.

To me, the text isn't superfluous, because otherwise the connection to
what has been said in the previous sentence seems tenuous, which
impacts understandability. We'll see what other people think, I guess.
Perhaps there's some altogether better way to talk about this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pgsql: Add new GUC createrole_self_grant.

2023-01-16 Thread David G. Johnston

On Monday, January 16, 2023, Robert Haas  wrote:
>
>
> I don't really think there's too much wrong with what I wrote in the
> patch as proposed, and I would like to get it committed and move on
> without getting drawn into a wide-ranging discussion of every way in
> which we might be able to improve the surrounding structure.
>

I’m moving on as well.  Go with what you have.  I have my personal
understanding clarified at this point.  If the docs need more work people
will ask questions to help guide such work.

David J.

Re: PL/Python: Fix return in the middle of PG_TRY() block.

Xing Guo  writes:
> Are there any unsafe codes in pltcl.c? The return statement is in the
> PG_CATCH() block, I think the exception stack has been recovered in
> PG_CATCH block so the return statement in PG_CATCH block should be ok?

Yes, the stack has already been unwound at the start of a PG_CATCH
(or PG_FINALLY) block, so there is no reason to avoid returning
out of those.

In principle you could also mess things up with a "continue", "break",
or "goto" leading out of PG_TRY.  That's probably far less likely
than "return", but I wonder whether Andres' compiler hack will
catch that.

regards, tom lane

Re: pgsql: Add new GUC createrole_self_grant.

On Sat, Jan 14, 2023 at 8:12 PM David G. Johnston
 wrote:
> OK, given all of that, I suggest reworking the first paragraph of security 
> definer functions safely to something like the following:
>
> (Replace: Because a SECURITY DEFINER function is executed with the privileges 
> of the user that owns it, care is needed to ensure that the function cannot 
> be misused. For security, search_path should be set to exclude any schemas 
> writable by untrusted users.) with:
>
> The execution of a SECURITY DEFINER function has two interacting behaviors 
> that make writing and administering such functions require extra care.  While 
> the privileges that come into play during execution are those of the function 
> owner, the execution environment is inherited from the calling context.  
> Therefore, any settings that the function relies upon must be specified in 
> the SET clause of the CREATE command (or within the body of the function).
>
> Of particular importance is the search_path setting.  The search_path should 
> be set to the bare minimum required for the function to operate and, more 
> importantly, not include any schemas writable by untrusted users.
>
> 
> This prevents malicious users [...]
> (existing example)
> [...] the function could be subverted by creating a temporary table named 
> pwds.
> 

I find this wording less clear than what we have now. And I reiterate
that the purpose of the patch under discussion is to add a mention of
the new GUC to an existing section, not to rewrite that section -- or
any other section -- of the documentation.

> 
> Another setting of note (at least in the case that the function owner is not 
> a superuser) is createrole_self_grant.  While the function owner has their 
> own pg_db_role_setting preference for this setting, when wrapping execution 
> of CREATE ROLE within a function, particularly to be executed by others, it 
> is the executor's setting that would be in effect, not the owner's.
> 

I think these sentences are really contorted, and they are also
factually incorrect. For this setting to matter, you need (1) the
function to be running CREATEROLE and (2) the owner of that function
to not be a superuser. You've put those facts at opposite end of the
sentence. You've also brought pg_db_role_setting into it, which
doesn't matter here because it's applied at login, and a security
definer function doesn't log in as the user to which it switches.
There really is no such thing as "the owner's" setting. There may be a
setting which is applied to the owner's session if the owner logs in,
but there's no default value for all code run as the owner -- perhaps
there should be, but that's not how it works. I don't think we have
much precedent for using the word "executor" to mean "the user who
called a function" as opposed to "the code that executed a planned
query".

I don't really think there's too much wrong with what I wrote in the
patch as proposed, and I would like to get it committed and move on
without getting drawn into a wide-ranging discussion of every way in
which we might be able to improve the surrounding structure.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: macOS versioned sysroot

Peter Eisentraut  writes:
> src/tools/darwin_sysroot (previously in src/template/darwin) contains this:
> # [...] Using a version-specific sysroot seems
> # desirable, so if the path is a non-version-specific symlink, expand
> # it.

> This code has been whacked around quite a bit, so it's hard to find the 
> origin of this.  But I'm going to submit a vote for "seems *not* desirable".

The reasoning for this is in the commit log for 4823621db:

Also, "xcrun --show-sdk-path" may give a path that is valid but lacks
any OS version identifier.  We don't really want that, since most
of the motivation for wiring -isysroot into the build flags at all
is to ensure that all parts of a PG installation are built against
the same SDK, even when considering extensions built later and/or on
a different machine.  Insist on finding "N.N" in the directory name
before accepting the result.  (Adding "--sdk macosx" to the xcrun
call seems to produce the same answer as xcodebuild, but usually
more quickly because it's cached, so we also try that as a fallback.)

The core reason why we don't want to use Xcode's default SDK in cases
like this is that Apple's technology for introducing new syscalls
does not play nice with Autoconf: for example, configure will think
that preadv/pwritev exist when using a Big Sur SDK, even when building
on an older macOS version where they don't exist.  It'd be nice to
have a better solution to that problem, but this patch doesn't attempt
to fix that.

Discussion: 
https://postgr.es/m/ed3b8e5d-0da8-6ebd-fd1c-e0ac80a4b...@postgrespro.ru

Re-reading the linked discussion makes me quite loath to remove
the version dependency logic; we'd certainly just be reinventing
that wheel next time Apple adds a new syscall that we care about.

Perhaps it would be adequate if we could select MacOSX13.sdk instead
of MacOSX13.1.sdk given the available choices:

$ ll 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs
total 0
drwxr-xr-x  7 root  wheel  224 Nov 12 16:18 MacOSX.sdk/
lrwxr-xr-x  1 root  wheel   10 Dec 14 10:51 MacOSX13.1.sdk@ -> MacOSX.sdk
lrwxr-xr-x  1 root  wheel   10 Dec 14 10:51 MacOSX13.sdk@ -> MacOSX.sdk

It does not seem that xcrun/xcodebuild will offer that, but we
could contemplate putting in some ad-hoc pathname munging to
strip any dot-digits part.

regards, tom lane

Re: [PATCH] Add <> support to sepgsql_restorecon

2023-01-16 Thread Joe Conway

On 1/16/23 09:55, Ted Toth wrote:

On Sun, Jan 15, 2023 at 1:11 PM Joe Conway > wrote:

On 11/21/22 17:35, Joe Conway wrote:
 > On 11/21/22 15:57, Ted Toth wrote:
 >> In SELinux file context files you can specify <> for a file
 >> meaning you don't want restorecon to relabel it. <> is
 >> especially useful in an SELinux MLS environment when objects are
 >> created at a specific security level and you don't want
restorecon to
 >> relabel them to the wrong security level.
 >
 > +1
 >
 > Please add to the next commitfest here:
 > https://commitfest.postgresql.org/41/

Comments:

1. It seems like the check for a "<>" context should go into
sepgsql_object_relabel() directly rather than exec_object_restorecon().
The former gets registered as a hook in _PG_init(), so the with the
current location we would fail to skip the relabel when that gets
called.

The intent is not to stop all relabeling only to stop sepgsql_restorecon 
from doing a bulk relabel. I believe sepgsql_object_relabel is called by 
the 'SECURITY LABEL'  statement which I'm using to set the label of db 
objects to a specific context which I would not want altered later by a 
restorecon.

Ok, sounds reasonable. Maybe just add a comment to that effect.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Add <> support to sepgsql_restorecon

2023-01-16 Thread Ted Toth

On Sun, Jan 15, 2023 at 1:11 PM Joe Conway  wrote:

> On 11/21/22 17:35, Joe Conway wrote:
> > On 11/21/22 15:57, Ted Toth wrote:
> >> In SELinux file context files you can specify <> for a file
> >> meaning you don't want restorecon to relabel it. <> is
> >> especially useful in an SELinux MLS environment when objects are
> >> created at a specific security level and you don't want restorecon to
> >> relabel them to the wrong security level.
> >
> > +1
> >
> > Please add to the next commitfest here:
> > https://commitfest.postgresql.org/41/
>
>
> Comments:
>
> 1. It seems like the check for a "<>" context should go into
> sepgsql_object_relabel() directly rather than exec_object_restorecon().
> The former gets registered as a hook in _PG_init(), so the with the
> current location we would fail to skip the relabel when that gets called.
>

The intent is not to stop all relabeling only to stop sepgsql_restorecon
from doing a bulk relabel. I believe sepgsql_object_relabel is called by
the 'SECURITY LABEL'  statement which I'm using to set the label of db
objects to a specific context which I would not want altered later by a
restorecon.


> 2. Please provide one or more test case (likely in label.sql)
>
> 3. An example, or at least a note, mentioning "<>" context and the
> implications would be appropriate.
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
>

Re: [PATCH] Fix alter subscription concurrency errors

On Wed, 12 Oct 2022 at 10:48, Michael Paquier  wrote:
>
> On Fri, Sep 09, 2022 at 06:37:07PM +0200, Alvaro Herrera wrote:
> > Would it work to use get_object_address() instead?  That would save
> > having to write a lookup-and-lock function with a retry loop for each
> > object type.
>
> Jeite, this thread is waiting for your input.  This is a bug fix, so I
> have moved this patch to the next CF for now to keep track of it.

Jeite, please post an updated version with the fixes. As CommitFest
2023-01 is currently underway, this would be an excellent time to
update the patch.

Regards,
Vignesh

Re: [PATCH] Completed unaccent dictionary with many missing characters

On Fri, 4 Nov 2022 at 04:59, Ian Lawrence Barwick  wrote:
>
> 2022年7月13日(水) 19:13 Przemysław Sztoch :
> >
> > Dear Michael P.,
> >
> > 3. The matter is not that simple. When I change priorities (ie 
> > Latin-ASCII.xml is less important than Unicode decomposition),
> > then "U + 33D7" changes not to pH but to PH.
> > In the end, I left it like it was before ...
> >
> > If you decide what to do with point 3, I will correct it and send new 
> > patches.
> >
> > What is your decision?
> > Option 1: We leave x as in Latin-ASCII.xml and we also have full 
> > compatibility with previous PostgreSQL versions.
> > If they fix Latin-ASCII.xml at Unicode, it will fix itself.
> >
> > Option 2:  We choose a lower priority for entries in Latin-ASCII.xml
> >
> > I would choose option 1.
> >
> > P.S. I will be going on vacation and it would be nice to close this patch 
> > soon. TIA.
>
> Hi
>
> This entry was marked as "Needs review" in the CommitFest app but cfbot
> reports the patch no longer applies.
>
> We've marked it as "Waiting on Author". As CommitFest 2022-11 is
> currently underway, this would be an excellent time update the patch.
>
> Once you think the patchset is ready for review again, you (or any
> interested party) can  move the patch entry forward by visiting
>
> https://commitfest.postgresql.org/40/3631/
>
> and changing the status to "Needs review".

I was not sure if you will be planning to post an updated version of
patch as the patch has been awaiting your attention from last
commitfest, please post an updated version for it soon or update the
commitfest entry accordingly.  As CommitFest 2023-01 is currently
underway, this would be an excellent time to update the patch.

Regards,
Vignesh

Re: Allow parallel plan for referential integrity checks?

On Mon, 12 Dec 2022 at 22:06, Frédéric Yhuel  wrote:
>
>
>
> On 12/11/22 06:29, Ian Lawrence Barwick wrote:
> > 2022年7月26日(火) 20:58 Frédéric Yhuel :
> >>
> >>
> >>
> >> On 4/14/22 14:25, Frédéric Yhuel wrote:
> >>>
> >>>
> >>> On 3/19/22 01:57, Imseih (AWS), Sami wrote:
>  I looked at your patch and it's a good idea to make foreign key
>  validation
>  use parallel query on large relations.
> 
>  It would be valuable to add logging to ensure that the ActiveSnapshot
>  and TransactionSnapshot
>  is the same for the leader and the workers. This logging could be
>  tested in the TAP test.
> 
>  Also, inside RI_Initial_Check you may want to set max_parallel_workers to
>  max_parallel_maintenance_workers.
> 
>  Currently the work_mem is set to maintenance_work_mem. This will also
>  require
>  a doc change to call out.
> 
>  /*
> * Temporarily increase work_mem so that the check query can be
>  executed
> * more efficiently.  It seems okay to do this because the query
>  is simple
> * enough to not use a multiple of work_mem, and one typically
>  would not
> * have many large foreign-key validations happening
>  concurrently.  So
> * this seems to meet the criteria for being considered a
>  "maintenance"
> * operation, and accordingly we use maintenance_work_mem.
>  However, we
> 
> >>>
> >>> Hello Sami,
> >>>
> >>> Thank you for your review!
> >>>
> >>> I will try to do as you say, but it will take time, since my daily job
> >>> as database consultant takes most of my time and energy.
> >>>
> >>
> >> Hi,
> >>
> >> As suggested by Jacob, here is a quick message to say that I didn't find
> >> enough time to work further on this patch, but I didn't completely
> >> forget it either. I moved it to the next commitfest. Hopefully I will
> >> find enough time and motivation in the coming months :-)
> >
> > Hi Frédéric
> >
> > This patch has been carried forward for a couple more commitfests since
> > your message; do you think you'll be able to work on it further during this
> > release cycle?
> >
>
> Hi Ian,
>
> I've planned to work on it full time on week 10 (6-10 March), if you
> agree to bear with me. The idea would be to bootstrap my brain on it,
> and then continue to work on it from time to time.

I have moved this to Mar commitfest as the patch update is expected to
happen during March commitfest.

Regards,
Vignesh

Re: Adding CommandID to heap xlog records

On Thu, 3 Nov 2022 at 15:06, Ian Lawrence Barwick  wrote:
>
> 2022年9月30日(金) 1:04 Matthias van de Meent :
> >
> > On Wed, 28 Sept 2022 at 19:40, Bruce Momjian  wrote:
> > >
> > > On Thu, Sep 22, 2022 at 11:12:32PM +0200, Matthias van de Meent wrote:
> > > > On Thu, 8 Sept 2022 at 23:24, Tom Lane  wrote:
> > > > >
> > > > > Matthias van de Meent  writes:
> > > > > > Please find attached a patch that adds the CommandId of the 
> > > > > > inserting
> > > > > > transaction to heap (batch)insert, update and delete records. It is
> > > > > > based on the changes we made in the fork we maintain for Neon.
> > > > >
> > > > > This seems like a very significant cost increment with returns
> > > > > to only a minuscule number of users.  We certainly cannot consider
> > > > > it unless you provide some evidence that that impression is wrong.
> > > >
> > > > Attached a proposed set of patches to reduce overhead of the inital 
> > > > patch.
> > >
> > > This might be obvious to some, but the patch got a lot larger.  :-(
> >
> > Sorry for that, but updating the field from which the redo manager
> > should pull its information does indeed touch a lot of files because
> > most users of xl_info are only interested in the 4 bits reserved for
> > the redo-manager. Most of 0001 is therefore updates to point code to
> > the new field in XLogRecord, and renaming the variables and arguments
> > from info to rminfo.
> >
> > [tangent] With that refactoring, I also clean up a lot of code that
> > was using a wrong macro/constant for rmgr flags; `info &
> > ~XLR_INFO_MASK` may have the same value as `info &
> > XLR_RMGR_INFO_MASK`, but that's only guaranteed by the documentation;
> > and would require the same significant rework if new bits were
> > assigned to non-XLR_INFO_MASK and non-XLR_RMGR_INFO_MASK. [/tangent]
> >
> > 0002 grew a bit as well, but not to a degree that I think is worrying
> > or otherwise impossible to review.
>
> Hi
>
> This entry was marked as "Needs review" in the CommitFest app but cfbot
> reports the patch no longer applies.
>
> We've marked it as "Waiting on Author". As CommitFest 2022-11 is
> currently underway, this would be an excellent time update the patch.
>
> Once you think the patchset is ready for review again, you (or any
> interested party) can  move the patch entry forward by visiting
>
> https://commitfest.postgresql.org/40/3882/
>
> and changing the status to "Needs review".

I was not sure if you will be planning to post an updated version of
patch as the patch has been awaiting your attention from last
commitfest, please post an updated version for it soon or update the
commitfest entry accordingly.

Regards,
Vignesh

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Nikita Malakhov

Hi!

Here's the patch that fixes this case, please check it out.
The patch adds vacuum_is_permitted_for_relation() check before adding
partition relation to the vacuum list, and if permission is denied the
relation
is not added, so it is not passed to vacuum_rel() and there are no try to
acquire the lock.

Cheers!

On Mon, Jan 16, 2023 at 4:48 PM Nikita Malakhov  wrote:

> Hi!
>
> I've checked this expand_vacuum_rel() and made a quick fix for this.Here's
> the result of the test:
>
> postgres@postgres=# set role regress_vacuum_conflict;
> SET
> Time: 0.369 ms
> postgres@postgres=> vacuum vacuum_tab;
> WARNING:  permission denied to vacuum "vacuum_tab", skipping it
> WARNING:  permission denied to vacuum "vacuum_tab_1", skipping it
> WARNING:  permission denied to vacuum "vacuum_tab_2", skipping it
> VACUUM
> Time: 0.936 ms
> postgres@postgres=>
>
> Looks like it's a subject for a patch.
>
> On Mon, Jan 16, 2023 at 11:18 AM Alexander Pyhalov <
> a.pyha...@postgrespro.ru> wrote:
>
>> Hi.
>>
>> We've run regress isolation tests on partitioned tables and found
>> interesting VACUUM behavior. I'm not sure, if it's intended.
>>
>> In the following example, partitioned tables and regular tables behave
>> differently:
>>
>> CREATE TABLE vacuum_tab (a int) PARTITION BY HASH (a);
>> CREATE TABLE vacuum_tab_1 PARTITION OF vacuum_tab FOR VALUES WITH
>> (MODULUS 2, REMAINDER 0);
>> CREATE TABLE vacuum_tab_2 PARTITION OF vacuum_tab FOR VALUES WITH
>> (MODULUS 2, REMAINDER 1);
>> CREATE ROLE regress_vacuum_conflict;
>>
>> In the first session:
>>
>> begin;
>>   LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
>>
>> In the second:
>> SET ROLE regress_vacuum_conflict;
>>   VACUUM vacuum_tab;
>>   WARNING:  permission denied to vacuum "vacuum_tab", skipping it <
>> hangs here, trying to lock vacuum_tab_1
>>
>> In non-partitioned case second session exits after emitting warning. In
>> partitioned case, it hangs, trying to get locks.
>> This is due to the fact that in expand_vacuum_rel() we skip parent table
>> if vacuum_is_permitted_for_relation(), but don't perform such check for
>> its child.
>> The check will be performed later in vacuum_rel(), but after
>> vacuum_open_relation(), which leads to hang in the second session.
>>
>> Is it intended? Why don't we perform vacuum_is_permitted_for_relation()
>> check for inheritors in expand_vacuum_rel()?
>>
>> --
>> Best regards,
>> Alexander Pyhalov,
>> Postgres Professional
>>
>>
>>
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> https://postgrespro.ru/
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


0001_expand_vac_rel_part_chk_v1.patch
Description: Binary data

Re: Generating code for query jumbling through gen_node_support.pl


On 13.01.23 08:54, Michael Paquier wrote:

Using a "jumble_ignore" flag is equally invasive to using an
"jumble_include" flag for each field, I am afraid, as the number of
fields in the nodes included in jumbles is pretty equivalent to the
number of fields ignored.  I tend to prefer the approach of ignoring
things though, which is more consistent with the practive for node
read, write and copy.


I concur that jumble_ignore is better.  I suppose you placed the 
jumble_ignore markers to maintain parity with the existing code, but I 
think that some the markers are actually wrong and are just errors of 
omission in the existing code (such as Query.override, to pick a random 
example).  The way you have structured this would allow us to find and 
analyze such errors better.



Anyway, when it comes to the location, another thing that can be
considered here would be to require a node-level flag for the nodes on
which we want to track the location.  This overlaps a bit with the
fields satisfying "($t eq 'int' && $f =~ 'location$')", but it removes
most of the code changes like this one as at the end we only care
about the location for Const nodes:
-   int location;   /* token location, or -1 if unknown */
+   int location pg_node_attr(jumble_ignore);   /* token location, or -1
+* if unknown */

I have taken this approach in v2 of the patch, shaving ~100 lines of
more code as there is no need to mark all these location fields with
"jumble_ignore" anymore, except for Const, of course.


I don't understand why you chose that when we already have an 
established way.  This would just make the jumble annotations 
inconsistent with the other annotations.


I also have two suggestions to make this patch more palatable:

1. Make a separate patch to reformat long comments, like 
835d476fd21bcfb60b055941dee8c3d9559af14c.


2. Make a separate patch to move the jumble support to 
src/backend/nodes/jumblefuncs.c.  (It was probably a mistake that it 
wasn't there to begin with, and some of the errors of omission alluded 
to above are probably caused by it having been hidden away in the wrong 
place.)

Re: Inconsistency in vacuum behavior

2023-01-16 Thread Nikita Malakhov

Hi!

I've checked this expand_vacuum_rel() and made a quick fix for this.Here's
the result of the test:

postgres@postgres=# set role regress_vacuum_conflict;
SET
Time: 0.369 ms
postgres@postgres=> vacuum vacuum_tab;
WARNING:  permission denied to vacuum "vacuum_tab", skipping it
WARNING:  permission denied to vacuum "vacuum_tab_1", skipping it
WARNING:  permission denied to vacuum "vacuum_tab_2", skipping it
VACUUM
Time: 0.936 ms
postgres@postgres=>

Looks like it's a subject for a patch.

On Mon, Jan 16, 2023 at 11:18 AM Alexander Pyhalov 
wrote:

> Hi.
>
> We've run regress isolation tests on partitioned tables and found
> interesting VACUUM behavior. I'm not sure, if it's intended.
>
> In the following example, partitioned tables and regular tables behave
> differently:
>
> CREATE TABLE vacuum_tab (a int) PARTITION BY HASH (a);
> CREATE TABLE vacuum_tab_1 PARTITION OF vacuum_tab FOR VALUES WITH
> (MODULUS 2, REMAINDER 0);
> CREATE TABLE vacuum_tab_2 PARTITION OF vacuum_tab FOR VALUES WITH
> (MODULUS 2, REMAINDER 1);
> CREATE ROLE regress_vacuum_conflict;
>
> In the first session:
>
> begin;
>   LOCK vacuum_tab IN SHARE UPDATE EXCLUSIVE MODE;
>
> In the second:
> SET ROLE regress_vacuum_conflict;
>   VACUUM vacuum_tab;
>   WARNING:  permission denied to vacuum "vacuum_tab", skipping it <
> hangs here, trying to lock vacuum_tab_1
>
> In non-partitioned case second session exits after emitting warning. In
> partitioned case, it hangs, trying to get locks.
> This is due to the fact that in expand_vacuum_rel() we skip parent table
> if vacuum_is_permitted_for_relation(), but don't perform such check for
> its child.
> The check will be performed later in vacuum_rel(), but after
> vacuum_open_relation(), which leads to hang in the second session.
>
> Is it intended? Why don't we perform vacuum_is_permitted_for_relation()
> check for inheritors in expand_vacuum_rel()?
>
> --
> Best regards,
> Alexander Pyhalov,
> Postgres Professional
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Make EXPLAIN generate a generic plan for a parameterized query

2023-01-16 Thread Jim Jones


Hi Laurenz,

I'm testing your patch and the GENERIC_PLAN parameter seems to work just 
OK ..


db=# CREATE TABLE t (col numeric);
CREATE TABLE
db=# CREATE INDEX t_col_idx ON t (col);
CREATE INDEX
db=# INSERT INTO t SELECT random() FROM generate_series(1,10) ;
INSERT 0 10
db=# EXPLAIN (GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
    QUERY PLAN
---
 Bitmap Heap Scan on t  (cost=15.27..531.67 rows=368 width=32)
   Recheck Cond: (col = $1)
   ->  Bitmap Index Scan on t_col_idx  (cost=0.00..15.18 rows=368 width=0)
 Index Cond: (col = $1)
(4 rows)


.. the error message when combining GENERIC_PLAN with ANALYSE also works 
as expected


db=# EXPLAIN (ANALYSE, GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
ERROR:  EXPLAIN ANALYZE cannot be used with GENERIC_PLAN

.. and the system also does not throw an error when it's used along 
other parameters, e.g. VERBOSE, WAL, SUMMARY, etc.


However, when GENERIC_PLAN is used combined with BUFFERS, the 'Buffers' 
node is shown the first time the query executed in a session:


psql (16devel)
Type "help" for help.

postgres=# \c db
You are now connected to database "db" as user "postgres".
db=# EXPLAIN (BUFFERS, GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
   QUERY PLAN
-
 Index Only Scan using t_col_idx on t  (cost=0.42..4.44 rows=1 width=11)
   Index Cond: (col = $1)
 Planning:
   Buffers: shared hit=62
(4 rows)

db=# EXPLAIN (BUFFERS, GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
   QUERY PLAN
-
 Index Only Scan using t_col_idx on t  (cost=0.42..4.44 rows=1 width=11)
   Index Cond: (col = $1)
(2 rows)

db=# EXPLAIN (BUFFERS, GENERIC_PLAN) SELECT * FROM t WHERE col = $1;
   QUERY PLAN
-
 Index Only Scan using t_col_idx on t  (cost=0.42..4.44 rows=1 width=11)
   Index Cond: (col = $1)
(2 rows)

Is it the expected behaviour?

Also, this new parameter seems only to work between parenthesis 
`(GENERIC_PLAN)`:


db=# EXPLAIN GENERIC_PLAN SELECT * FROM t WHERE col = $1;
ERROR:  syntax error at or near "GENERIC_PLAN"
LINE 1: EXPLAIN GENERIC_PLAN SELECT * FROM t WHERE col = $1;


If it's intended to be consistent with the other "single parameters", 
perhaps it should work also without parenthesis? e.g.


db=# EXPLAIN ANALYSE SELECT * FROM t WHERE col < 0.42;
  QUERY PLAN
--
 Index Only Scan using t_col_idx on t  (cost=0.42..1637.25 rows=41876 
width=11) (actual time=0.103..6.293 rows=41932 loops=1)

   Index Cond: (col < 0.42)
   Heap Fetches: 0
 Planning Time: 0.071 ms
 Execution Time: 7.316 ms
(5 rows)


db=# EXPLAIN VERBOSE SELECT * FROM t WHERE col < 0.42;
  QUERY PLAN
---
 Index Only Scan using t_col_idx on public.t (cost=0.42..1637.25 
rows=41876 width=11)

   Output: col
   Index Cond: (t.col < 0.42)
(3 rows)


On a very personal note: wouldn't just GENERIC (without _PLAN) suffice? 
Don't bother with it if you disagree :-)


Cheers
Jim

On 09.01.23 17:40, Laurenz Albe wrote:

On Tue, 2022-12-27 at 14:37 -0800, Michel Pelletier wrote:

I built and tested this patch for review and it works well, although I got the 
following warning when building:

analyze.c: In function 'transformStmt':
analyze.c:2919:35: warning: 'generic_plan' may be used uninitialized in this 
function [-Wmaybe-uninitialized]
  2919 |         pstate->p_generic_explain = generic_plan;
       |         ~~^~
analyze.c:2909:25: note: 'generic_plan' was declared here
  2909 |         bool            generic_plan;
       |                         ^~~~

Thanks for checking.  The variable should indeed be initialized, although
my compiler didn't complain.

Attached is a fixed version.

Yours,
Laurenz Albe

Re: Record queryid when auto_explain.log_verbose is on

2023-01-16 Thread Julien Rouhaud

Hi,

On Mon, Jan 16, 2023 at 09:36:59PM +0900, torikoshia wrote:
>
> As far as I read the manual below, auto_explain.log_verbose should record
> logs equivalent to VERBOSE option of EXPLAIN.

Ah good catch, that's clearly an oversight!

> Attached patch makes auto_explain also print query identifiers.
>
> What do you think?

@@ -407,6 +408,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
ExplainPrintTriggers(es, queryDesc);
if (es->costs)
ExplainPrintJITSummary(es, queryDesc);
+   if (es->verbose && queryDesc->plannedstmt->queryId != 
UINT64CONST(0))
+   ExplainPropertyInteger("Query Identifier", 
NULL, (int64)
+  
queryDesc->plannedstmt->queryId, es);

For interactive EXPLAIN the query identifier is printed just after the plan,
before the triggers and the JIT summary so auto_explain should do the same.

Other than that looks good to me.

Re: PL/Python: Fix return in the middle of PG_TRY() block.

2023-01-16 Thread Xing Guo

Hi,

I revised my patch, added the missing one that Nathan mentioned.

Are there any unsafe codes in pltcl.c? The return statement is in the
PG_CATCH() block, I think the exception stack has been recovered in
PG_CATCH block so the return statement in PG_CATCH block should be ok?

```
PG_TRY();
{
UTF_BEGIN;
ereport(level,
(errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
errmsg("%s", UTF_U2E(Tcl_GetString(objv[2]);
UTF_END;
}
PG_CATCH();
{
ErrorData  *edata;

/* Must reset elog.c's state */
MemoryContextSwitchTo(oldcontext);
edata = CopyErrorData();
FlushErrorState();

/* Pass the error data to Tcl */
pltcl_construct_errorCode(interp, edata);
UTF_BEGIN;
Tcl_SetObjResult(interp, Tcl_NewStringObj(UTF_E2U(edata->message), -1));
UTF_END;
FreeErrorData(edata);

return TCL_ERROR;
}
PG_END_TRY();
```

Best Regards,
Xing








On Sat, Jan 14, 2023 at 2:03 AM Nathan Bossart 
wrote:

> On Thu, Jan 12, 2023 at 09:49:00PM -0800, Andres Freund wrote:
> > On 2023-01-12 10:44:33 -0800, Nathan Bossart wrote:
> >> There's another "return" later on in this PG_TRY block.  I wonder if
> it's
> >> possible to detect this sort of thing at compile time.
> >
> > Clang provides some annotations that allow to detect this kind of thing.
> I
> > hacked up a test for this, and it finds quite a bit of prolematic
> > code.
>
> Nice!
>
> > plpython is, uh, not being good? But also in plperl, pltcl.
>
> Yikes.
>
> > ../../../../home/andres/src/postgresql/src/pl/tcl/pltcl.c:1830:1:
> warning: no_returns_in_pg_try 'no_returns_handle' is not held on every path
> through here [-Wthread-safety-analysis]
> > }
> > ^
> > ../../../../home/andres/src/postgresql/src/pl/tcl/pltcl.c:1809:2: note:
> no_returns_in_pg_try acquired here
> > PG_CATCH();
> > ^
> > ../../../../home/andres/src/postgresql/src/include/utils/elog.h:433:7:
> note: expanded from macro 'PG_CATCH'
> > no_returns_start(no_returns_handle##__VA_ARGS__)
> > ^
> >
> > Not perfect digestible, but also not too bad. I pushed the
> > no_returns_start()/no_returns_stop() calls into all the PG_TRY related
> macros,
> > because that causes the warning to point to block that the problem is
> > in. E.g. above the first warning points to PG_TRY, the second to
> > PG_CATCH. It'd work to just put it into PG_TRY and PG_END_TRY.
>
> This seems roughly as digestible as the pg_prevent_errno_in_scope stuff.
> However, on my macOS machine with clang 14.0.0, the messages say "mutex"
> instead of "no_returns_in_pg_try," which is unfortunate since that's the
> part that would clue me into what the problem is.  I suppose it'd be easy
> enough to figure out after a grep or two, though.
>
> > Clearly this would need a bunch more work, but it seems promising? I
> think
> > there'd be other uses than this.
>
> +1
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>
diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index 923703535a..7181b5e898 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -414,12 +414,12 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
 	PyObject   *volatile args = NULL;
 	int			i;
 
+	args = PyList_New(proc->nargs);
+	if (!args)
+		return NULL;
+
 	PG_TRY();
 	{
-		args = PyList_New(proc->nargs);
-		if (!args)
-			return NULL;
-
 		for (i = 0; i < proc->nargs; i++)
 		{
 			PLyDatumToOb *arginfo = &proc->args[i];
@@ -684,18 +684,28 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc, HeapTuple *r
 			   *pltrelid,
 			   *plttablename,
 			   *plttableschema;
-	PyObject   *pltargs,
+	PyObject   *pltargs = NULL,
 			   *pytnew,
 			   *pytold;
 	PyObject   *volatile pltdata = NULL;
 	char	   *stroid;
 
-	PG_TRY();
+	pltdata = PyDict_New();
+	if (!pltdata)
+		return NULL;
+
+	if (tdata->tg_trigger->tgnargs)
 	{
-		pltdata = PyDict_New();
-		if (!pltdata)
+		pltargs = PyList_New(tdata->tg_trigger->tgnargs);
+		if (!pltargs)
+		{
+			Py_DECREF(pltdata);
 			return NULL;
+		}
+	}
 
+	PG_TRY();
+	{
 		pltname = PLyUnicode_FromString(tdata->tg_trigger->tgname);
 		PyDict_SetItemString(pltdata, "name", pltname);
 		Py_DECREF(pltname);
@@ -835,12 +845,6 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc, HeapTuple *r
 			int			i;
 			PyObject   *pltarg;
 
-			pltargs = PyList_New(tdata->tg_trigger->tgnargs);
-			if (!pltargs)
-			{
-Py_DECREF(pltdata);
-return NULL;
-			}
 			for (i = 0; i < tdata->tg_trigger->tgnargs; i++)
 			{
 pltarg = PLyUnicode_FromString(tdata->tg_trigger->tgargs[i]);

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-01-16 Thread Aleksander Alekseev

Hi hackers,

> The proposed patchset changes the documentation and the error messages
> accordingly, making them less misleading. 0001 corrects the
> documentation but doesn't touch the code. 0002 and 0003 correct the
> messages shown when approaching xidWrapLimit and xidWarnLimit
> accordingly.

A colleague of mine, Oleksii Kliukin, pointed out that the
recommendation about running VACUUM in a single-user mode is also
outdated, as it was previously reported in [1]. I didn't believe it at
first and decided to double-check:

```
=# select * from phonebook;
 id |  name   | phone
+-+---
  1 | Alex|   123
  5 | Charlie |   789
  2 | Bob |   456
  6 | Ololo   |   789
(4 rows)

=# insert into phonebook values (7, 'Trololo', 987);
ERROR:  database is not accepting commands to avoid wraparound data
loss in database "template1"
HINT:  Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions,
or drop stale replication slots.

=# VACUUM FREEZE;
VACUUM

=# insert into phonebook values (7, 'Trololo', 987);
INSERT 0 1

=# SELECT current_setting('wal_level');
 current_setting
-
 logical
```

Unfortunately the [1] discussion went nowhere. So I figured it would
be appropriate to add corresponding changes to the proposed patchset
since it's relevant and is registered in the CF app already. PFA
patchset v2 which now also includes 0004.

[1]: 
https://www.postgresql.org/message-id/flat/CAMT0RQTmRj_Egtmre6fbiMA9E2hM3BsLULiV8W00stwa3URvzA%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev


v2-0001-Correct-the-docs-about-preventing-XID-wraparound.patch
Description: Binary data


v2-0003-Fix-the-message-in-case-of-exceeding-xidWarnLimit.patch
Description: Binary data


v2-0004-Don-t-recommend-running-VACUUM-in-a-single-user-m.patch
Description: Binary data


v2-0002-Fix-the-message-in-case-of-approaching-xidWrapLim.patch
Description: Binary data

Record queryid when auto_explain.log_verbose is on

2023-01-16 Thread torikoshia


Hi,

As far as I read the manual below, auto_explain.log_verbose should 
record logs equivalent to VERBOSE option of EXPLAIN.



-- https://www.postgresql.org/docs/devel/auto-explain.html
auto_explain.log_verbose controls whether verbose details are printed 
when an execution plan is logged; it's equivalent to the VERBOSE option 
of EXPLAIN.


However, when compute_query_id is on, query identifiers are only printed 
when using VERBOSE option of EXPLAIN.


EXPLAIN VERBOSE:
```
=# show auto_explain.log_verbose;
 auto_explain.log_verbose
--
 on
(1 row)

=# show compute_query_id;
 compute_query_id
--
 on
(1 row)

=# explain verbose select 1;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 1
 Query Identifier: -1801652217649936326
(3 rows)
```

auto_explain:
```
LOG:  0: duration: 0.000 ms  plan:
Query Text: explain verbose select 1;
Result  (cost=0.00..0.01 rows=1 width=4)
  Output: 1
```

Attached patch makes auto_explain also print query identifiers.

What do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 0e2612a1c7762b64357c85ce04e62b5ba0cdb4f7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 16 Jan 2023 20:41:09 +0900
Subject: [PATCH v1] Make auto_explain record query identifier

When auto_explain.log_verbose is on, auto_explain should record logs
equivalent to VERBOSE option of EXPLAIN. Howerver, when
compute_query_id is on, query identifiers are only printed in
VERBOSE option of EXPLAIN.
This patch makes auto_explain also print query identifier.
---
 contrib/auto_explain/auto_explain.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c3ac27ae99..79d4ad6785 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -21,6 +21,7 @@
 #include "jit/jit.h"
 #include "nodes/params.h"
 #include "utils/guc.h"
+#include "utils/queryjumble.h"
 
 PG_MODULE_MAGIC;
 
@@ -407,6 +408,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 ExplainPrintTriggers(es, queryDesc);
 			if (es->costs)
 ExplainPrintJITSummary(es, queryDesc);
+			if (es->verbose && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+ExplainPropertyInteger("Query Identifier", NULL, (int64)
+	   queryDesc->plannedstmt->queryId, es);
 			ExplainEndOutput(es);
 
 			/* Remove last line break */

base-commit: 1561612e3bf3264c31618b9455d0c1003b9271ec
-- 
2.25.1

Re: Rethinking the implementation of ts_headline()

2023-01-16 Thread Alvaro Herrera

On 2022-Nov-25, Tom Lane wrote:

> After further contemplation of bug #17691 [1], I've concluded that
> what I did in commit c9b0c678d was largely misguided.  For one
> thing, the new hlCover() algorithm no longer finds shortest-possible
> cover strings: if your query is "x & y" and the text is like
> "... x ... x ... y ...", then the selected cover string will run
> from the first occurrence of x to the y, whereas the old algorithm
> would have correctly selected "x ... y".  For another thing, the
> maximum-cover-length hack that I added in 78e73e875 to band-aid
> over the performance issues of the original c9b0c678d patch means
> that various scenarios no longer work as well as they used to,
> which is the proximate cause of the complaints in bug #17691.

I came across #17556 which contains a different test for this, and I'm
not sure that this patch changes things completely for the better.  In
that bug report, Alex Malek presents this example

select ts_headline('baz baz baz ipsum ' || repeat(' foo ',4998) || 'labor',
   $$'ipsum' & 'labor'$$::tsquery,
   'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3'),
ts_headline('baz baz baz ipsum ' || repeat(' foo ',4999) || 'labor',
   $$'ipsum' & 'labor'$$::tsquery,
   'StartSel={, StopSel=}, MaxFragments=100, MaxWords=7, MinWords=3');

which returns, in the current HEAD, the following
 ts_headline │ ts_headline 
─┼─
 {ipsum} ... {labor} │ baz baz baz
(1 fila)

That is, once past the 5000 words of distance, it fails to find a good
cover, but before that it returns an acceptable headline.  However,
after your proposed patch, we get this:

 ts_headline │ ts_headline 
─┼─
 {ipsum} │ {ipsum}
(1 fila)

which is an improvement in the second case, though perhaps not as much
as we would like, and definitely not an improvement in the first case.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."   (New York Times, about Microsoft PowerPoint)

Re: Using AF_UNIX sockets always for tests on Windows

2023-01-16 Thread Juan José Santamaría Flecha

Hello,

On Fri, Dec 2, 2022 at 1:03 AM Thomas Munro  wrote:

>
> 1.  Teach mkdtemp() to make a non-world-accessible directory.  This is
> required to be able to make a socket that other processes can't
> connect to, to match the paranoia level used on Unix.  This was
> written just by reading documentation, because I am not a Windows
> user, so I would be grateful for a second opinion and/or testing from
> a Windows hacker, which would involve testing with two different
> users.  The idea is that Windows' mkdir() is completely ignoring the
> permissions (we can see in the mingw headers that it literally throws
> away the mode argument), so we shouldn't use that, but native
> CreateDirectory() when given a pointer to a SECURITY_ATTRIBUTES with
> lpSecurityDesciptor set to NULL should only allow the current user to
> access the object (directory).  Does this really work, and would it be
> better to create some more explicit private-keep-out
> SECURITY_ATTRIBUTE, and how would that look?
>

A directory created with a NULL SECURITY_ATTRIBUTES inherits the ACL from
its parent directory [1]. In this case, its parent is the designated
temporary location, which already should have a limited access.

You can create an explicit DACL for that directory, PFA a patch for so.
This is just an example, not something that I'm proposing as a committable
alternative.

I'm fairly sure that filesystem permissions must be enough to stop
> another OS user from connecting, because it's clear from documentation
> that AF_UNIX works on Windows by opening the file and reading some
> magic "reparse" data from inside it, so if you can't see into the
> containing directory, you can't do it.  This patch is the one the rest
> are standing on, because the tests should match Unix in their level of
> security.
>

Yes, this is correct.

>
> Only the first patch is modified, but I'm including all of them so they go
through the cfbot.

[1]
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilea


Regards,

Juan José Santamaría Flecha


v2-0004-Doc-Abstract-AF_UNIX-sockets-don-t-work-on-Windows-a.patch
Description: Binary data


v2-0003-WIP-Stop-using-TCP-in-TAP-tests-on-Windows.patch
Description: Binary data


v2-0002-WIP-Always-use-Unix-domain-sockets-in-pg_regress-on-.patch
Description: Binary data


v2-0001-WIP-Make-mkdtemp-more-secure-on-Windows.patch
Description: Binary data

Re: Small miscellaneus fixes (Part II)

2023-01-16 Thread Ranier Vilela

Em seg., 16 de jan. de 2023 às 03:28, John Naylor <
john.nay...@enterprisedb.com> escreveu:

>
> I wrote:
> > ...but arguably the earlier check is close enough that it's silly to
> assert in the "else" branch, and I'd be okay with just nuking those lines.
> Another thing that caught my attention is the assumption that unsetting a
> bit is so expensive that we have to first check if it's set, so we may as
> well remove "IS_BRACKET(Np->Num)" as well.
>
> The attached is what I mean. I'll commit this this week unless there are
> objections.
>
+1 looks good to me.

regards,
Ranier Vilela

Re: Allow tailoring of ICU locales with custom rules


On 11.01.23 03:50, vignesh C wrote:

On Thu, 5 Jan 2023 at 20:45, Peter Eisentraut
 wrote:


Patch needed a rebase; no functionality changes.


The patch does not apply on top of HEAD as in [1], please post a rebased patch:


Updated patch attached.From 8744abe8e56e25b8d76d1201c4fa40af273a09de Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH v3] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml| 18 +++
 doc/src/sgml/ref/create_collation.sgml| 22 +
 doc/src/sgml/ref/create_database.sgml | 12 +
 src/backend/catalog/pg_collation.c|  5 ++
 src/backend/commands/collationcmds.c  | 23 +++--
 src/backend/commands/dbcommands.c | 49 +--
 src/backend/utils/adt/pg_locale.c | 41 +++-
 src/backend/utils/init/postinit.c | 11 -
 src/include/catalog/pg_collation.h|  2 +
 src/include/catalog/pg_database.h |  3 ++
 src/include/utils/pg_locale.h |  1 +
 .../regress/expected/collate.icu.utf8.out | 30 
 src/test/regress/sql/collate.icu.utf8.sql | 13 +
 13 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index 2f034e2859..d6cc5646fa 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
int32 collencoding,
const char *collcollate, const char *collctype,
const char *colliculocale,
+   const char *collicurules,
const char *collversion,
bool if_not_exists,
bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
values[Anum_pg_collation_colliculocale - 1] = 
CStringGetTextDatum(colliculocale);
else
nulls[Anum_pg_collation_colliculocale - 1] = true;
+   if (collicurules)
+   values[Anum_pg_collation_collicurules - 1] = 
CStringGetTextDatum(collicurules);
+   else
+   nulls[Anum_pg_collation_collicurules - 1] = true;
if (collversion)
values[Anum_pg_collation_collversion - 1] = 
CStringGe

Re: pgsql: Doc: add XML ID attributes to and tags.


On 12.01.23 00:05, Tom Lane wrote:

That reminds me that I was going to suggest fixing the few existing
variances from the "use '-' not '_'" policy:

$ grep 'id="[a-zA-Z0-9-]*_' *sgml ref/*sgml
config.sgml: 


should be fixed


libpq.sgml:
libpq.sgml:
libpq.sgml:
libpq.sgml:


I think we can leave these.  They are internally consistent.


pgbuffercache.sgml:  


should be fixed


ref/pg_checksums.sgml: 


pretty bogus

RE: Exit walsender before confirming remote flush in logical replication

2023-01-16 Thread Hayato Kuroda (Fujitsu)

Dear Horiguchi-san,

> If I'm grabbing the discussion here correctly, in my memory, it is
> because: physical replication needs all records that have written on
> primary are written on standby for switchover to succeed. It is
> annoying that normal shutdown occasionally leads to switchover
> failure. Thus WalSndDone explicitly waits for remote flush/write
> regardless of the setting of synchronous_commit.

AFAIK the condition (sentPtr == replicatedPtr) seemed to be introduced for the 
purpose[1].
You meant to say that the conditon (!pq_is_send_pending()) has same motivation, 
right?

> Thus apply delay
> doesn't affect shutdown (AFAICS), and that is sufficient since all the
> records will be applied at the next startup.

I was not clear the word "next startup", but I agreed that we can shut down the
walsender in case of recovery_min_apply_delay > 0 and synchronous_commit = 
remote_apply.
The startup process will be not terminated even if the primary crashes, so I
think the process will apply the transaction sooner or later.

> In logical replication apply preceeds write and flush so we have no
> indication whether a record is "replicated" to standby by other than
> apply LSN. On the other hand, logical recplication doesn't have a
> business with switchover so that assurarance is useless. Thus I think
> we can (practically) ignore apply_lsn at shutdown. It seems subtly
> irregular, though.

Another consideration is that the condition (!pq_is_send_pending()) ensures that
there are no pending messages, including other packets. Currently we force 
walsenders
to clean up all messages before shutting down, even if it is a keepalive one.
I cannot have any problems caused by this, but I can keep the condition in case 
of
logical replication.

I updated the patch accordingly. Also, I found that the previous version
did not work well in case of streamed transactions. When a streamed transaction
is committed on publisher but the application is delayed on subscriber, the
process sometimes waits until there is no pending write. This is done in
ProcessPendingWrites(). I added another termination path in the function.

[1]: 
https://github.com/postgres/postgres/commit/985bd7d49726c9f178558491d31a570d47340459

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v4-0001-Exit-walsender-before-confirming-remote-flush-in-.patch
Description:  v4-0001-Exit-walsender-before-confirming-remote-flush-in-.patch

RE: Exit walsender before confirming remote flush in logical replication

2023-01-16 Thread Hayato Kuroda (Fujitsu)

Dear Horiguchi-san, Amit,

> At Fri, 13 Jan 2023 16:41:08 +0530, Amit Kapila 
> wrote in
> > Okay, but what happens in the case of physical replication when
> > synchronous_commit = remote_apply? In that case, won't it ensure that
> > apply has also happened? If so, then shouldn't the time delay feature
> > also cause a similar problem for physical replication as well?
> 
> As written in another mail, WalSndDone doesn't honor
> synchronous_commit. In other words, AFAIS walsender finishes not
> waiting remote_apply.  The unapplied recods will be applied at the
> next startup.
> 
> I didn't confirmed that behavior for myself, though..

If Amit wanted to say about the case that sending data is pending in physical
replication, the walsender cannot stop. But this is not related with the
synchronous_commit: it is caused because it must sweep all pending data before
shutting down. We can reproduce the situation with:

1. build streaming replication system
2. kill -STOP $walreceiver
3. insert data to primary server
4. try to stop the primary server

If what you said was not related with pending data, walsender can be stopped 
even
if the synchronous_commit = remote_apply. As Horiguchi-san said, such a 
condition
is not written in WalSndDone() [1]. I think the parameter synchronous_commit 
does
not affect walsender process so well. It just define when backend returns the
result to client.

I could check by following steps:

1. built streaming replication system. PSA the script to follow that.

Primary config.

```
synchronous_commit = 'remote_apply'
synchronous_standby_names = 'secondary'
```

Secondary config.

```
recovery_min_apply_delay = 1d
primary_conninfo = 'user=postgres port=$port_N1 application_name=secondary'
hot_standby = on
```

2. inserted data to primary. This waited the remote apply

psql -U postgres -p $port_primary -c "INSERT INTO tbl SELECT generate_series(1, 
5000)"

3. Stopped the primary server from another terminal. It could be done.
The terminal on step2 said like:

```
WARNING:  canceling the wait for synchronous replication and terminating 
connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
```

[1]: 
https://github.com/postgres/postgres/blob/master/src/backend/replication/walsender.c#L3121

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



test_phy.sh
Description: test_phy.sh

Re: Refactor recordExtObjInitPriv()