Re: emit recovery stats via a new file or a new hook

2021-11-02 Thread Amit Kapila
On Mon, Nov 1, 2021 at 12:30 AM Andres Freund  wrote:
>
> I'm not sure that the new log messages aren't sufficient. But if they
> aren't, it seems better to keep additional data in the stats system, and
> make them visible via views, rather than adding yet another place to
> keep stats.
>

+1. This is exactly what came to my mind after reading Bharath's email.

-- 
With Regards,
Amit Kapila.




Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-02 Thread Amit Kapila
On Sun, Oct 31, 2021 at 11:40 PM Jeff Davis  wrote:
>
> Attached is a WIP patch to add new WAL records to represent a logical
> insert, update, or delete. These records do not do anything at REDO
> time, they are only processed during logical decoding/replication.
>
> These are intended to be used by a custom table AM, like my columnar
> compression extension[0], which currently supports physical replication
> but can't support logical decoding/replication because decoding is not
> extensible. Using these new logical records would be redundant, making
> inserts/updates/deletes less efficient, but at least logical decoding
> would work (the lack of which is columnar's biggest weakness).
>
> Alternatively, we could support extensible WAL with extensible
> decoding. I also like this approach, but it takes more work for an AM
> like columnar to get that right -- it needs to keep additional state in
> the walsender to track and assemble the compressed columns stored
> across many blocks. It also requires a lot of care, because mistakes
> can get you into serious trouble.
>
> This proposal, for new logical records without WAL extensibility,
> provides a more shallow ramp to get a table AM working (including
> logical replication/decoding) without the need to invest in the WAL
> design. Later, of course I'd like the option for extensible WAL as well
> (to be more efficient), but right now I'd prefer it just worked
> (inefficiently).
>

You have modeled DecodeLogicalInsert based on current DecodeInsert and
it generates the same change message, so not sure how exactly these
new messages will be different from current heap_insert/update/delete
messages? Also, we have code to deal with different types of messages
at various places during decoding, so if they will be different then
we need to probably deal at those places as well.

-- 
With Regards,
Amit Kapila.




RE: Added schema level support for publication.

2021-11-02 Thread houzj.f...@fujitsu.com
On Wed, Nov 3, 2021 12:25 PM vignesh C  wrote:
> On Wed, Nov 3, 2021 at 8:30 AM Amit Kapila 
> wrote:
> >
> > On Tue, Nov 2, 2021 at 8:13 PM Tomas Vondra
> >  wrote:
> > >
> > >
> > > >
> > > > Yeah, that is also true. So maybe at this, we can just rename the
> > > > few types as suggested by you and we can look at it later if we
> > > > anytime have more number of objects to add.
> > > >
> > >
> > > +1
> > >
> >
> > Apart from what you have pointed above, we are using
> > "DO_PUBLICATION_REL_IN_SCHEMA" in pg_dump. I think we should replace
> > that as well with "DO_PUBLICATION_TABLE_IN_SCHEMA".
> 
> Thanks for the comments, the attached patch has the changes for the same.
> 

Thanks for the patch.
I have only one minor comment:

+   PUBLICATIONOBJ_TABLE_IN_SCHEMA, /* Relations in schema type */

I think the ' Relations' in the comments also need to be changed to 'tables'.

The other part of the patch looks good to me.

Best regards,
Hou zj




Re: [PATCH] Native spinlock support on RISC-V

2021-11-02 Thread Thomas Munro
On Wed, Nov 3, 2021 at 5:13 PM Andres Freund  wrote:
> Any chance you could enable jit_dump_bitcode and manually try a failing 
> query? That should dump. bc files in the data directory. That'd might allow 
> debugging this outside the emulated environment.
>
> I don't see where the undef is originating from, but I think it might suggest 
> that something lead to that code being considered unreachable.

postgres=# set jit_above_cost = 0;
SET
postgres=# select 1 + 1;
FATAL:  fatal llvm error: Cannot select: 0x4b3ec1a0: i64,ch =
load<(load 8 from %ir.14)> 0x41ef6fe8, 0x4b3ec138, undef:i64
  0x4b3ec138: i64 = add nuw 0x4b3eab60, Constant:i64<24>
0x4b3eab60: i64,ch = load<(load 8 from %ir.7)> 0x41ef6fe8,
0x4b3eaaf8, undef:i64
  0x4b3eaaf8: i64 = add nuw 0x4b3ea068, Constant:i64<16>
0x4b3ea068: i64,ch = CopyFromReg 0x41ef6fe8, Register:i64 %4
  0x4b3ea000: i64 = Register %4
0x4b3ea888: i64 = Constant<16>
  0x4b3ea6e8: i64 = undef
0x4b3ea9c0: i64 = Constant<24>
  0x4b3ea6e8: i64 = undef
In function: evalexpr_0_0

Ah, I hadn't noticed this in the log before:

'generic' is not a recognized processor for this target (ignoring processor)

Sounds kinda serious :-)

Resulting .bc files and .ll files (produced by llvm-dis) attached.


75392.0.bc
Description: Binary data


75392.0.optimized.bc
Description: Binary data


75392.0.ll
Description: Binary data


75392.0.optimized.ll
Description: Binary data


Re: Added schema level support for publication.

2021-11-02 Thread vignesh C
On Wed, Nov 3, 2021 at 8:30 AM Amit Kapila  wrote:
>
> On Tue, Nov 2, 2021 at 8:13 PM Tomas Vondra
>  wrote:
> >
> >
> > >
> > > Yeah, that is also true. So maybe at this, we can just rename the few
> > > types as suggested by you and we can look at it later if we anytime
> > > have more number of objects to add.
> > >
> >
> > +1
> >
>
> Apart from what you have pointed above, we are using
> "DO_PUBLICATION_REL_IN_SCHEMA" in pg_dump. I think we should replace
> that as well with "DO_PUBLICATION_TABLE_IN_SCHEMA".

Thanks for the comments, the attached patch has the changes for the same.

Regards,
Vignesh
From 533de4b3024e12b033d82995a3b920b51f2a7d64 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Wed, 3 Nov 2021 09:21:22 +0530
Subject: [PATCH] Renamed enums that included REL to TABLE in publish schema
 feature.

Renamed PUBLICATIONOBJ_REL_IN_SCHEMA, PUBLICATIONOBJ_CURRSCHEMA,
DO_PUBLICATION_REL_IN_SCHEMA and PRIO_PUBLICATION_REL_IN_SCHEMA
to PUBLICATIONOBJ_TABLE_IN_SCHEMA, PUBLICATIONOBJ_TABLE_IN_CURRSCHEMA,
DO_PUBLICATION_TABLE_IN_SCHEMA and PRIO_PUBLICATION_TABLE_IN_SCHEMA in publish
schema feature, this change is required to avoid confusion with other objects
like sequences which will be available in the future.
---
 src/backend/commands/publicationcmds.c |  8 
 src/backend/parser/gram.y  | 12 ++--
 src/bin/pg_dump/pg_dump.c  |  6 +++---
 src/bin/pg_dump/pg_dump.h  |  2 +-
 src/bin/pg_dump/pg_dump_sort.c |  6 +++---
 src/include/nodes/parsenodes.h |  5 +++--
 6 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index d1fff13d2e..c8c45d9426 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -169,13 +169,13 @@ ObjectsInPublicationToOids(List *pubobjspec_list, ParseState *pstate,
 			case PUBLICATIONOBJ_TABLE:
 *rels = lappend(*rels, pubobj->pubtable);
 break;
-			case PUBLICATIONOBJ_REL_IN_SCHEMA:
+			case PUBLICATIONOBJ_TABLE_IN_SCHEMA:
 schemaid = get_namespace_oid(pubobj->name, false);
 
 /* Filter out duplicates if user specifies "sch1, sch1" */
 *schemas = list_append_unique_oid(*schemas, schemaid);
 break;
-			case PUBLICATIONOBJ_CURRSCHEMA:
+			case PUBLICATIONOBJ_TABLE_IN_CURRSCHEMA:
 search_path = fetch_search_path(false);
 if (search_path == NIL) /* nothing valid in search_path? */
 	ereport(ERROR,
@@ -214,7 +214,7 @@ CheckObjSchemaNotAlreadyInPublication(List *rels, List *schemaidlist,
 
 		if (list_member_oid(schemaidlist, relSchemaId))
 		{
-			if (checkobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA)
+			if (checkobjtype == PUBLICATIONOBJ_TABLE_IN_SCHEMA)
 ereport(ERROR,
 		errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		errmsg("cannot add schema \"%s\" to publication",
@@ -613,7 +613,7 @@ AlterPublicationSchemas(AlterPublicationStmt *stmt,
 		rels = OpenReliIdList(reloids);
 
 		CheckObjSchemaNotAlreadyInPublication(rels, schemaidlist,
-			  PUBLICATIONOBJ_REL_IN_SCHEMA);
+			  PUBLICATIONOBJ_TABLE_IN_SCHEMA);
 
 		CloseTableList(rels);
 		PublicationAddSchemas(pubform->oid, schemaidlist, false, stmt);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d0eb80e69c..8683dc1a37 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -9662,14 +9662,14 @@ PublicationObjSpec:
 			| ALL TABLES IN_P SCHEMA ColId
 {
 	$$ = makeNode(PublicationObjSpec);
-	$$->pubobjtype = PUBLICATIONOBJ_REL_IN_SCHEMA;
+	$$->pubobjtype = PUBLICATIONOBJ_TABLE_IN_SCHEMA;
 	$$->name = $5;
 	$$->location = @5;
 }
 			| ALL TABLES IN_P SCHEMA CURRENT_SCHEMA
 {
 	$$ = makeNode(PublicationObjSpec);
-	$$->pubobjtype = PUBLICATIONOBJ_CURRSCHEMA;
+	$$->pubobjtype = PUBLICATIONOBJ_TABLE_IN_CURRSCHEMA;
 	$$->location = @5;
 }
 			| ColId
@@ -17351,17 +17351,17 @@ preprocess_pubobj_list(List *pubobjspec_list, core_yyscan_t yyscanner)
 pubobj->name = NULL;
 			}
 		}
-		else if (pubobj->pubobjtype == PUBLICATIONOBJ_REL_IN_SCHEMA ||
- pubobj->pubobjtype == PUBLICATIONOBJ_CURRSCHEMA)
+		else if (pubobj->pubobjtype == PUBLICATIONOBJ_TABLE_IN_SCHEMA ||
+ pubobj->pubobjtype == PUBLICATIONOBJ_TABLE_IN_CURRSCHEMA)
 		{
 			/*
 			 * We can distinguish between the different type of schema
 			 * objects based on whether name and pubtable is set.
 			 */
 			if (pubobj->name)
-pubobj->pubobjtype = PUBLICATIONOBJ_REL_IN_SCHEMA;
+pubobj->pubobjtype = PUBLICATIONOBJ_TABLE_IN_SCHEMA;
 			else if (!pubobj->name && !pubobj->pubtable)
-pubobj->pubobjtype = PUBLICATIONOBJ_CURRSCHEMA;
+pubobj->pubobjtype = PUBLICATIONOBJ_TABLE_IN_CURRSCHEMA;
 			else
 ereport(ERROR,
 		errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index b9635a95b6..7e98371d25 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ 

Re: [PATCH] Native spinlock support on RISC-V

2021-11-02 Thread Andres Freund
Hi, 

On November 2, 2021 3:55:58 PM PDT, Thomas Munro  wrote:
>2.  When configured with all options on FreeBSD 13, everything looks
>good so far except LLVM JIT, which fails with various "Cannot select"
>errors.  Clang works fine for compiling PostgreSQL itself.  Tested
>with LLVM 12 (LLVM has supported RISCV since 9).  Example:
>
>+FATAL:  fatal llvm error: Cannot select: 0x4f772068: ch = brcond
>0x4f770f70, 0x4f772208, BasicBlock:ch< 0x4f76d600>
>+  0x4f772208: i64 = setcc 0x4f7723a8, Constant:i64<0>, setlt:ch
>+0x4f7723a8: i64,ch = load<(load 4 from `i32* inttoptr (i64
>1260491408 to i32*)`, align 16), sext from i32> 0x4fdee058,
>Constant:i64<1260491408>, undef:i64
>+  0x4f770a90: i64 = Constant<1260491408>
>+  0x4f7703a8: i64 = undef
>+0x4f7701a0: i64 = Constant<0>
>+In function: evalexpr_0_0

Any chance you could enable jit_dump_bitcode and manually try a failing query? 
That should dump. bc files in the data directory. That'd might allow debugging 
this outside the emulated environment.

I don't see where the undef is originating from, but I think it might suggest 
that something lead to that code being considered unreachable.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: parallel vacuum comments

2021-11-02 Thread Amit Kapila
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada  wrote:
>
> On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan  wrote:
> >
>
> > Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
> > assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
> > dedicated shmem area for the array of LVSharedIndStats (no more
> > storing LVSharedIndStats entries at the end of the LVShared space in
> > an ad-hoc, type unsafe way). There should be one array element for
> > each and every index -- even those indexes where parallel index
> > vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
> > processing for "not worthwhile" indexes actually makes sense, BTW). We
> > can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
> > also add new per-index state fields to LVSharedIndStats itself. We
> > could directly record the status of each index (e.g., parallel unsafe,
> > amvacuumcleanup processing done, ambulkdelete processing done)
> > explicitly. All code could safely subscript the LVSharedIndStats array
> > directly, using idx style integers. That seems far more robust and
> > consistent.
>
> Sounds good.
>
> During the development, I wrote the patch while considering using
> fewer shared memory but it seems that it brought complexity (and
> therefore the bug). It would not be harmful even if we allocate index
> statistics on DSM for unsafe indexes and “not worthwhile" indexes in
> practice.
>

If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-11-02 Thread Amit Kapila
On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada  wrote:
>
> On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila  wrote:
> >
> > On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila  
> > > wrote:
> > > >
> > > >
> > > > Fair enough. So statistics can be removed either by vacuum or drop
> > > > subscription. Also, if we go by this logic then there is no harm in
> > > > retaining the stat entries for tablesync errors. Why have different
> > > > behavior for apply and tablesync workers?
> > >
> > > My understanding is that the subscription worker statistics entry
> > > corresponds to workers (but not physical workers since the physical
> > > process is changed after restarting). So if the worker finishes its
> > > jobs, it is no longer necessary to show errors since further problems
> > > will not occur after that. Table sync worker’s job finishes when
> > > completing table copy (unless table sync is performed again by REFRESH
> > > PUBLICATION) whereas apply worker’s job finishes when the subscription
> > > is dropped.
> > >
> >
> > Actually, I am not very sure how users can use the old error
> > information after we allowed skipping the conflicting xid. Say, if
> > they want to add/remove some constraints on the table based on
> > previous errors then they might want to refer to errors of both the
> > apply worker and table sync worker.
>
> I think that in general, statistics should be retained as long as a
> corresponding object exists on the database, like other cumulative
> statistic views. So I’m concerned that an entry of a cumulative stats
> view is automatically removed by a non-stats-related function (i.g.,
> ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
> stats views.
>
> We can retain the stats entries for table sync worker but what I want
> to avoid is that the view shows many old entries that will never be
> updated. I've sometimes seen cases where the user mistakenly restored
> table data on the subscriber before creating a subscription, failed
> table sync on many tables due to unique violation, and truncated
> tables on the subscriber. I think that unlike the stats entries for
> apply worker, retaining the stats entries for table sync could be
> harmful since it’s likely to be a large amount (even hundreds of
> entries). Especially, it could lead to bloat the stats file since it
> has an error message. So if we do that, I'd like to provide a function
> for users to remove (not reset) stats entries manually.
>

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

-- 
With Regards,
Amit Kapila.




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Jan Wieck

On 11/2/21 20:02, Tom Lane wrote:


My objection to log_checkpoints=on is that that's going to produce
a constant stream of messages even when *nothing at all* is wrong.
Worse yet, a novice DBA would likely have a hard time understanding
from those messages whether there was anything to worry about or not.
If we could design a variant of log_checkpoints that would produce
output only when the situation really needs attention, I'd be fine
with enabling that by default.



Making log_checkpoints an enum sort of thing as already suggested might 
do that. Or (also already suggested) elevating checkpoint logging once 
it happened because of WAL for a while.


The thing I don't want to see us doing is *nothing at all* when pretty 
much everyone with some customer experience in the field is saying "this 
is the information we want to see post incident and nobody has it so we 
sit there waiting for the next time it happens."



Regards

--
Jan Wieck




Re: pgsql: Document XLOG_INCLUDE_XID a little better

2021-11-02 Thread Amit Kapila
On Thu, Oct 28, 2021 at 8:15 AM Amit Kapila  wrote:
>
> On Wed, Oct 27, 2021 at 4:39 PM Dilip Kumar  wrote:
> >
> > On Tue, Oct 26, 2021 at 9:19 AM Amit Kapila  wrote:
> > >
> > > >
> > > > Thanks, both your patches look good to me except that we need to
> > > > remove the sentence related to the revert of ade24dab97 from the
> > > > commit message. I think we should backpatch the first patch to 14
> > > > where it was introduced and commit the second patch (related to moving
> > > > code out of critical section) only to HEAD but we can even backpatch
> > > > the second one till 9.6 for the sake of consistency. What do you guys
> > > > think?
> > > >
> > >
> > > The other option could be to just commit both these patches in HEAD as
> > > there is no correctness issue here.
> >
> > Right, even I feel we should just commit it to the HEAD as there is no
> > correctness issue.
> >
>
> Thanks for your opinion. I'll commit it to the HEAD by next Tuesday
> unless someone feels that we should backpatch this.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Added schema level support for publication.

2021-11-02 Thread Amit Kapila
On Tue, Nov 2, 2021 at 8:13 PM Tomas Vondra
 wrote:
>
>
> >
> > Yeah, that is also true. So maybe at this, we can just rename the few
> > types as suggested by you and we can look at it later if we anytime
> > have more number of objects to add.
> >
>
> +1
>

Apart from what you have pointed above, we are using
"DO_PUBLICATION_REL_IN_SCHEMA" in pg_dump. I think we should replace
that as well with "DO_PUBLICATION_TABLE_IN_SCHEMA".

-- 
With Regards,
Amit Kapila.




Re: parallel vacuum comments

2021-11-02 Thread Masahiko Sawada
On Wed, Nov 3, 2021 at 11:53 AM Peter Geoghegan  wrote:
>
> On Tue, Nov 2, 2021 at 7:35 PM Masahiko Sawada  wrote:
> > It returns true in the above condition but it should return false
> > since the index doesn't support parallel index cleanup at all. It
> > seems that this bug was introduced by commit b4af70cb21 (therefore
> > exists only in PG14) which flipped the return values of this function
> > but missed one place. The index AMs that don't support parallel index
> > cleanup at all are affected by this bug. Among the supported index AM
> > in the core, hash indexes are affected but since they just return the
> > number of blocks during vacuumcleanup it would not become a serious
> > consequence.
> >
> > I've attached a patch to fix it.
>
> I pushed your fix just now.

Thanks!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: parallel vacuum comments

2021-11-02 Thread Peter Geoghegan
On Tue, Nov 2, 2021 at 7:35 PM Masahiko Sawada  wrote:
> It returns true in the above condition but it should return false
> since the index doesn't support parallel index cleanup at all. It
> seems that this bug was introduced by commit b4af70cb21 (therefore
> exists only in PG14) which flipped the return values of this function
> but missed one place. The index AMs that don't support parallel index
> cleanup at all are affected by this bug. Among the supported index AM
> in the core, hash indexes are affected but since they just return the
> number of blocks during vacuumcleanup it would not become a serious
> consequence.
>
> I've attached a patch to fix it.

I pushed your fix just now.

Thanks
-- 
Peter Geoghegan




Re: parallel vacuum comments

2021-11-02 Thread Masahiko Sawada
On Tue, Nov 2, 2021 at 2:46 PM Masahiko Sawada  wrote:
>
> Anyway, I'll write a patch accordingly.

While writing a patch for these comments, I found another bug in
parallel_processing_is_safe():

/*
 * Returns false, if the given index can't participate in parallel index
 * vacuum or parallel index cleanup
 */
static bool
parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
{
uint8   vacoptions = indrel->rd_indam->amparallelvacuumoptions;

/* first_time must be true only if for_cleanup is true */
Assert(lvshared->for_cleanup || !lvshared->first_time);

if (lvshared->for_cleanup)
{
/* Skip, if the index does not support parallel cleanup */
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return true;

It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.

I've attached a patch to fix it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


fix_parallel_processing_is_safe.patch
Description: Binary data


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Nikolay Samokhvalov
On Tue, Nov 2, 2021 at 5:02 PM Tom Lane  wrote:

> I'm still of the position that the default ought to be that a
> normally-functioning server generates no ongoing log output.
> Only people who have got Nagios watching their logs, or some
> such setup, are going to want anything different.  And that is
> a minority use-case.  There are going to be way more people
> bitching because their postmaster log overflowed their disk
> than there will be people who are happier because you made
> such output the default.  (Don't forget that our default
> logging setup does not rotate the logs.)
>

Is it known how many new Postgres installations are from
popular packages (that have log rotation enabled) compared
to custom-built and managed in their own way?

If people do not use packages these days, they should take
care of themselves – it includes log rotation and, for example,
autostart. The same people who might complain of overflowed
disks should already be complaining about Postgres not surviving
machine restarts, right?

Nik


Re: AArch64 has single-copy 64 bit atomicity

2021-11-02 Thread Alexander Korotkov
On Wed, Nov 3, 2021 at 1:34 AM Thomas Munro  wrote:
> Andres mentioned in passing that he'd defined
> PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY only on Itanium, PPC and x86 but
> not ARM.
>
> I took a look at https://developer.arm.com/documentation/ddi0487/gb/
> under "B2.2.1 Requirements for single-copy atomicity" and it seemed
> like we should turn this on for __aarch64__.  It goes back to the
> original ARMv8-A so should cover all 64 bit ARM systems.

That should be very good because ARM gains popularity and the effect
of atomic read is significant

--
Regards,
Alexander Korotkov




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Isaac Morland
On Tue, 2 Nov 2021 at 19:00, Vik Fearing  wrote:

> On 11/2/21 11:14 PM, Vik Fearing wrote:
>
> > This would be nice, but there is nothing to hang our hat on:
> >
> > GRANT CHECKPOINT TO username;
>
> Thinking about this more, why don't we just add CHECKPOINT and
> NOCHECKPOINT attributes to roles?
>
> ALTER ROLE username WITH CHECKPOINT;
>

At present, this would require adding a field to pg_authid. This isn't very
scalable; but we're already creating new pg_* roles which give access to
various actions so I don't know why a role attribute would be a better
approach. If anything, I think it would be more likely to move in the other
direction: replace role attributes that in effect grant privileges with
predefined roles. I think this has already been discussed here in the
context of CREATEROLE.


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Tom Lane
Bruce Momjian  writes:
> On Sun, Oct 31, 2021 at 10:24:38PM +0100, Michael Banck wrote:
>> It could be a bit of reverse-survivorship-bias, i.e., you're only seeing
>> the pathological cases, while 99% of the Postgres installations out
>> there just hum along fine without anybody ever having to look at the
>> checkpoint entries in the log.
>> 
>> But yeah, for serious production instances, it makes sense to turn that
>> option on, but I'm not convinced it should be the default.

> Yes, I agree with this analysis. There is a sense that people who
> regularly diagnose Postgres problems want this information in the logs
> by default, while the majority of sites might want silent logs for
> normal query activity.

I'm still of the position that the default ought to be that a
normally-functioning server generates no ongoing log output.
Only people who have got Nagios watching their logs, or some
such setup, are going to want anything different.  And that is
a minority use-case.  There are going to be way more people
bitching because their postmaster log overflowed their disk
than there will be people who are happier because you made
such output the default.  (Don't forget that our default
logging setup does not rotate the logs.)

However, I'm prepared to accept a tight definition of what
"normally-functioning" means.  For instance, I don't have a
problem with the idea of having log_autovacuum_min_duration
defaulting to something positive, as long as it's fairly large.
Then it's only going to emit anything if there is a situation
that really needs attention.

My objection to log_checkpoints=on is that that's going to produce
a constant stream of messages even when *nothing at all* is wrong.
Worse yet, a novice DBA would likely have a hard time understanding
from those messages whether there was anything to worry about or not.
If we could design a variant of log_checkpoints that would produce
output only when the situation really needs attention, I'd be fine
with enabling that by default.

regards, tom lane




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Bruce Momjian
On Sun, Oct 31, 2021 at 10:24:38PM +0100, Michael Banck wrote:
> On Sun, Oct 31, 2021 at 01:16:33PM -0700, Andres Freund wrote:
> > Shrug. It's based on many years of doing or being around people doing
> > postgres support escalation shifts. And it's not like log_checkpoints
> > incurs meaningful overhead or causes that much log volume.
> 
> It could be a bit of reverse-survivorship-bias, i.e., you're only seeing
> the pathological cases, while 99% of the Postgres installations out
> there just hum along fine without anybody ever having to look at the
> checkpoint entries in the log.
> 
> But yeah, for serious production instances, it makes sense to turn that
> option on, but I'm not convinced it should be the default.

Yes, I agree with this analysis. There is a sense that people who
regularly diagnose Postgres problems want this information in the logs
by default, while the majority of sites might want silent logs for
normal query activity.

I realize in this thread that Robert Haas mentioned foreign key
violations that would like also appear in logs, but those are
ERROR/WARNING etc. messages that can be filtered out with
log_min_message.  I think if we want to offer a more verbose set of
analytic output, by default or not, we might want to relabel them as
something other than LOG messages so they can be filtered out with
log_min_messages.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: enabling FOO=bar arguments to vcregress.pl

2021-11-02 Thread Michael Paquier
On Tue, Nov 02, 2021 at 05:59:49PM -0400, Andrew Dunstan wrote:
> On 11/2/21 11:03, Andrew Dunstan wrote:
>> I think you misunderstood the purpose of my email. It wasn't meant to
>> be complete patch.
>>
>>
>> But here's an untested patch that should do almost all of what I want
>> for USE_MODULE_DB.
>
> Now tested and doing what is expected. I'd like to backpatch it. Nobody
> who doesn't set USE_MODULE_DB would be at all affected, but I want to
> enable it for MSVC builds in the buildfarm.

Okay, now I get your intention here, thanks.  Shouldn't you add some
documentation to show out to use this facility when passing down
parameters?
--
Michael


signature.asc
Description: PGP signature


Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Michael Paquier
On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
> On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander  wrote:
>> I think for the end user, it is strictly better to name it "gzip",
>> and given that the target of this option is the end user we should
>> do so. (It'd be different it we were talking about a build-time
>> parameter to configure). 
> 
> I agree. Also, I think there's actually a file format called "zlib"
> which is slightly different from the "gzip" format, and you have to be
> careful not to generate the wrong one.

Okay, fine by me.  It would be better to be also consistent in
WalCompressionMethods once we switch to this option value, then.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)

2021-11-02 Thread Tom Lane
I wrote:
> Now that I've looked this over I'm starting to feel uncomfortable
> again, because we can't actually be quite sure about how the remote
> parser's heuristic will act.

Actually ... we could make that a lot safer by insisting that the
other input be a plain Var, which'd necessarily be a column of the
foreign table.  That would still cover most cases of practical
interest, I think, and it would remove any question of whether
implicit coercions had snuck in.  It's more restrictive than I'd
really like, but I think it's less likely to cause problems.

regards, tom lane




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Vik Fearing
On 11/2/21 11:14 PM, Vik Fearing wrote:

> This would be nice, but there is nothing to hang our hat on:
> 
> GRANT CHECKPOINT TO username;

Thinking about this more, why don't we just add CHECKPOINT and
NOCHECKPOINT attributes to roles?

ALTER ROLE username WITH CHECKPOINT;
-- 
Vik Fearing




Re: [PATCH] Native spinlock support on RISC-V

2021-11-02 Thread Thomas Munro
On Wed, Sep 1, 2021 at 9:22 PM Christoph Berg  wrote:
> Re: Tom Lane
> > I did not like confusing the RISC-V case with the ARM case: duplicating
> > the code block seems better, in case there's ever reason to add
> > arch-specific stuff like SPIN_DELAY.  So I split it off to its own
> > code block and pushed it.
>
> Fwiw I can report success on Debian's riscv port with that commit
> cherry-picked onto 13.4:
>
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-13=riscv64=13.4-3=1630411643=0

A couple of things I noticed on this architecture:

1.  Even though we're using generic built-ins for atomics, I guess we
could still use a src/include/port/atomics/arch-riscv.h file so we
have a place to define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY when
building for 64 bit.  We'd need to find the chapter-and-verse to
justify that, of course, which I can try to do; if someone more
familiar with the ISA/spec/manual can point to it I'm all ears.

2.  When configured with all options on FreeBSD 13, everything looks
good so far except LLVM JIT, which fails with various "Cannot select"
errors.  Clang works fine for compiling PostgreSQL itself.  Tested
with LLVM 12 (LLVM has supported RISCV since 9).  Example:

+FATAL:  fatal llvm error: Cannot select: 0x4f772068: ch = brcond
0x4f770f70, 0x4f772208, BasicBlock:ch< 0x4f76d600>
+  0x4f772208: i64 = setcc 0x4f7723a8, Constant:i64<0>, setlt:ch
+0x4f7723a8: i64,ch = load<(load 4 from `i32* inttoptr (i64
1260491408 to i32*)`, align 16), sext from i32> 0x4fdee058,
Constant:i64<1260491408>, undef:i64
+  0x4f770a90: i64 = Constant<1260491408>
+  0x4f7703a8: i64 = undef
+0x4f7701a0: i64 = Constant<0>
+In function: evalexpr_0_0




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Isaac Morland
On Tue, 2 Nov 2021 at 18:14, Vik Fearing  wrote:

> On 11/2/21 4:06 PM, Robert Haas wrote:
> > There's bound to be somebody who wants to grant some of
> > these permissions and not others, or who wants to grant the ability to
> > run those commands on some tables but not others.
> Is there anything stopping us from adding syntax like this?
>
> GRANT VACUUM, ANALYZE ON TABLE foo TO bar;
>

There is a limited number of bits available in the way privileges are
stored. I investigated this in 2018 in connection with an idea I had to
allow granting the ability to refresh a materialized view; after
consideration and discussion I came to the idea of having a "MAINTAIN"
permission which would allow refreshing materialized views and would also
cover clustering, reindexing, vacuuming, and analyzing on objects to which
those actions are applicable.

This message from me summarizes the history of usage of the available
privilege bits:

https://www.postgresql.org/message-id/CAMsGm5c4DycKBYZCypfV02s-SC8GwF%2BKeTt%3D%3DvbWrFn%2Bdz%3DKeg%40mail.gmail.com

If you dig into the replies you will find the revised proposal.

That doesn't fix the CHECKPOINT issue, but surely vacuum and analyze can
> be done that way.  I would much prefer that over new predefined roles.
>
> This would be nice, but there is nothing to hang our hat on:
>
> GRANT CHECKPOINT TO username;
>


Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)

2021-11-02 Thread Tom Lane
"Dian M Fay"  writes:
> Thanks Tom, that makes way more sense! I've attached a new patch which
> tests operands and makes sure one side is a Const before feeding it to
> deparseConst with a new showtype code, -2. The one regression is gone,
> but I've left a couple of test output discrepancies for now which
> showcase lost casts on the following predicates:

> * date(c5) = '1970-01-17'::date
> * ctid = '(0,2)'::tid

> These aren't exactly failures -- both implicit string comparisons work
> just fine -- but I don't know Postgres well enough to be sure that
> that's true more generally.

These seem fine to me.  The parser heuristic that we're relying on
acts at the level of the operator --- it doesn't really care whether
the other input argument is a simple Var or not.

Note that we're *not* doing an "implicit string comparison" in either
case.  The point here is that the remote parser will resolve the
unknown-type literal as being the same type as the other operator input,
that is date or tid in these two cases.

That being the goal, I think you don't have the logic right at all,
even if it happens to accidentally work in the tested cases.  We
can only drop the cast if it's a binary operator and the two inputs
are of the same type.  Testing "leftType == form->oprleft" is pretty
close to a no-op, because the input will have been coerced to the
operator's expected type.  And the code as you had it could do
indefensible things with a unary operator.  (It's probably hard to
get here with a unary operator applied to a constant, but I'm not
sure it's impossible.)

Attached is a rewrite that does what I think we want to do, and
also adds comments because there weren't any.

Now that I've looked this over I'm starting to feel uncomfortable
again, because we can't actually be quite sure about how the remote
parser's heuristic will act.  What we're checking is that leftType
and rightType match, but that condition is applied to the inputs
*after implicit type coercion to the operator's input types*.
We can't be entirely sure about what our parser saw to begin with.
Perhaps it'd be a good idea to strip any implicit coercions on
the non-Const input before checking its type.  I'm not sure how
much that helps though.  For one thing, by the time this code
sees the expression, eval_const_expressions could have collapsed
coercion steps in a way that obscures how it looked originally.
For another thing, in the cases we're interested in, it's kind of
a stretch to suppose that implicit coercions applied locally are
a good model of the way things will look to the remote parser.

So I'm feeling a bit itchy.  I'm still willing to push forward
with this, but I won't be terribly surprised if it breaks cases
that ought to work and we end up having to revert it.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index d98bd66681..67d397c354 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2695,9 +2695,12 @@ deparseVar(Var *node, deparse_expr_cxt *context)
  * Deparse given constant value into context->buf.
  *
  * This function has to be kept in sync with ruleutils.c's get_const_expr.
- * As for that function, showtype can be -1 to never show "::typename" decoration,
- * or +1 to always show it, or 0 to show it only if the constant wouldn't be assumed
- * to be the right type by default.
+ * As in that function, showtype can be -1 to never show "::typename"
+ * decoration, or +1 to always show it, or 0 to show it only if the constant
+ * wouldn't be assumed to be the right type by default.  In addition,
+ * this code allows showtype to be -2 to indicate that we should not show
+ * "::typename" decoration if the constant is printed as an untyped literal
+ * or NULL (while in other cases, behaving as for showtype == 0).
  */
 static void
 deparseConst(Const *node, deparse_expr_cxt *context, int showtype)
@@ -2707,6 +2710,7 @@ deparseConst(Const *node, deparse_expr_cxt *context, int showtype)
 	bool		typIsVarlena;
 	char	   *extval;
 	bool		isfloat = false;
+	bool		isstring = false;
 	bool		needlabel;
 
 	if (node->constisnull)
@@ -2762,13 +2766,14 @@ deparseConst(Const *node, deparse_expr_cxt *context, int showtype)
 			break;
 		default:
 			deparseStringLiteral(buf, extval);
+			isstring = true;
 			break;
 	}
 
 	pfree(extval);
 
-	if (showtype < 0)
-		return;
+	if (showtype == -1)
+		return;	/* never print type label */
 
 	/*
 	 * For showtype == 0, append ::typename unless the constant will be
@@ -2788,7 +2793,13 @@ deparseConst(Const *node, deparse_expr_cxt *context, int showtype)
 			needlabel = !isfloat || (node->consttypmod >= 0);
 			break;
 		default:
-			needlabel = true;
+			if (showtype == -2)
+			{
+/* label unless we printed it as an untyped string */
+needlabel = !isstring;
+			}
+			else
+needlabel = true;
 			break;
 	}
 	if (needlabel || showtype > 0)
@@ -2953,6 +2964,8 @@ deparseOpExpr(OpExpr 

AArch64 has single-copy 64 bit atomicity

2021-11-02 Thread Thomas Munro
Hi,

Andres mentioned in passing that he'd defined
PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY only on Itanium, PPC and x86 but
not ARM.

I took a look at https://developer.arm.com/documentation/ddi0487/gb/
under "B2.2.1 Requirements for single-copy atomicity" and it seemed
like we should turn this on for __aarch64__.  It goes back to the
original ARMv8-A so should cover all 64 bit ARM systems.
From 1721c7b15b132c2aa16b18cf0b3eea054247833d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 3 Nov 2021 10:23:33 +1300
Subject: [PATCH] AArch64 has single-copy 64 bit atomicity.

This avoids using more expensive atomic operations when plain loads and
stores would do.
---
 src/include/port/atomics/arch-arm.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/include/port/atomics/arch-arm.h b/src/include/port/atomics/arch-arm.h
index efa385321e..a5b84baa71 100644
--- a/src/include/port/atomics/arch-arm.h
+++ b/src/include/port/atomics/arch-arm.h
@@ -24,3 +24,8 @@
 #if !defined(__aarch64__) && !defined(__aarch64)
 #define PG_DISABLE_64_BIT_ATOMICS
 #endif  /* __aarch64__ || __aarch64 */
+
+#if defined(__aarch64__)
+/* per architecture manual doubleword accesses have single copy atomicity */
+#define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
+#endif
-- 
2.25.1



Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread David G. Johnston
On Tue, Nov 2, 2021 at 3:14 PM Vik Fearing  wrote:

> On 11/2/21 4:06 PM, Robert Haas wrote:
> > There's bound to be somebody who wants to grant some of
> > these permissions and not others, or who wants to grant the ability to
> > run those commands on some tables but not others.
> Is there anything stopping us from adding syntax like this?
>
> GRANT VACUUM, ANALYZE ON TABLE foo TO bar;
>
> That doesn't fix the CHECKPOINT issue, but surely vacuum and analyze can
> be done that way.  I would much prefer that over new predefined roles.
>
> This would be nice, but there is nothing to hang our hat on:
>
> GRANT CHECKPOINT TO username;
>
>
Here is the thread when I last brought up this idea five years ago:

https://www.postgresql.org/message-id/CAKFQuwaAhVt6audf92Q1VrELfJ%2BPz%3DuDfNb8%3D1_bqAmyDpnDmA%40mail.gmail.com

I do not believe we've actually consumed any of the then available
permission bits in the meanwhile.

David J.


Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Vik Fearing
On 11/2/21 4:06 PM, Robert Haas wrote:
> There's bound to be somebody who wants to grant some of
> these permissions and not others, or who wants to grant the ability to
> run those commands on some tables but not others.
Is there anything stopping us from adding syntax like this?

GRANT VACUUM, ANALYZE ON TABLE foo TO bar;

That doesn't fix the CHECKPOINT issue, but surely vacuum and analyze can
be done that way.  I would much prefer that over new predefined roles.

This would be nice, but there is nothing to hang our hat on:

GRANT CHECKPOINT TO username;
-- 
Vik Fearing




Re: enabling FOO=bar arguments to vcregress.pl

2021-11-02 Thread Andrew Dunstan


On 11/2/21 11:03, Andrew Dunstan wrote:
> On 11/1/21 21:23, Michael Paquier wrote:
>> On Mon, Nov 01, 2021 at 11:33:21AM -0400, Andrew Dunstan wrote:
>>> As I mentioned recently at
>>> , 
>>> I want to get USE_MODULE_DB working for vcregress.pl. I started out
>>> writing code to strip this from the command line or get it from the
>>> environment, but then it struck me that if would be better to implement
>>> a general Makefile-like mechanism for handling FOO=bar type arguments on
>>> the command line, along the lines of the attached.
>> I am not sure to understand how that will that work with USE_MODULE_DB
>> which sets up the database names used by the regression tests.  Each
>> target's module has its own needs in terms of settings that can be
>> used, meaning that you would still need some boilerplate to do the
>> mapping between a variable and its command argument?
>
>
> I think you misunderstood the purpose of my email. It wasn't meant to
> be  complete patch.
>
>
> But here's an untested patch that should do almost all of what I want
> for USE_MODULE_DB.
>
>


Now tested and doing what is expected. I'd like to backpatch it. Nobody
who doesn't set USE_MODULE_DB would be at all affected, but I want to
enable it for MSVC builds in the buildfarm.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: XTS cipher mode for cluster file encryption

2021-11-02 Thread Bruce Momjian
On Mon, Nov  1, 2021 at 02:24:36PM -0400, Stephen Frost wrote:
> I can understand the general idea that we should be sure to engineer
> this in a way that multiple methods can be used, as surely one day folks
> will say that AES128 isn't acceptable any more.  In terms of what we'll
> do from the start, I would think providing the options of AES128 and
> AES256 would be good to ensure that we have the bits covered to support
> multiple methods and I don't think that would put us into a situation of

My patch supports AES128, AES192, and AES256.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Peter Geoghegan
On Tue, Nov 2, 2021 at 1:47 PM Robert Haas  wrote:
> Yeah. I have only very rarely run into cases where people actually end
> up needing multiple passes, but it's always something I need to rule
> out as part of the troubleshooting process, and it's hard to do that
> without the log_autovacuum_min_duration output. It's pretty common for
> me to see cases where, for example, the I/O performed by autovacuum
> read a bunch of data off disk, which shoved a bunch of hot data out of
> cache, and then performance tanked really hard.

Right. Indexes are a big source of the variability IME. I agree that
having to do more than one pass isn't all that common. More often it's
something to do with the fact that there are 20 indexes, or the fact
that they use UUID indexes. VACUUM can very easily end up dirtying far
more pages than might be expected, just because of these kinds of
variations.

Variability in the duration of VACUUM due to issues on the heapam side
tends to be due to things like the complicated relationship between
work done and XIDs consumed, the fact that autovacuum scheduling makes
VACUUM kick in at geometric intervals/in geometric series (the scale
factor is applied to the size of the table at the end of the last
autovacuum), and problems with maintaining accurate information about
the number of dead tuples in the table.

> Or where the vacuum
> cost limit is ridiculously low relative to the table size and the
> vacuums take an unreasonably long time. In those kinds of cases what
> you really need to know is that there was a vacuum on a certain table,
> and how long it took, and when that happened.

In short: very often everything is just fine, but when things do go
wrong, the number of plausible sources of trouble is vast. There is no
substitute for already having reasonably good instrumentation of
VACUUM in place, to show how things have changed over time. The time
dimension is often very important, for experts and non-experts alike.

The more success we have with improving VACUUM (e.g., the VM/freeze
map stuff), the more likely it is that the notable remaining problems
will be complicated and kind of weird. That's what I see.

--
Peter Geoghegan




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Vik Fearing
On 10/31/21 10:24 PM, Michael Banck wrote:
> To put another option on the table: maybe a compromise could be to log
> xlog checkpoints unconditionally, and the (checkpoint_timeout) time ones
> only if log_checkpoints are set (maybe with some exponential backoff to
> avoid log spam)?

If we're going to do something like that, we should convert it from a
boolean to an enum.

log_checkpoints = wal

(I'm being very careful not to slip on that slope.)
-- 
Vik Fearing




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 11:27 AM, "Stephen Frost"  wrote:
> * Bossart, Nathan (bossa...@amazon.com) wrote:
>> The approach in the patch looks alright to me, but another one could
>> be to build a SelectStmt when parsing CHECKPOINT.  I think that'd
>> simplify the standard_ProcessUtility() changes.
>
> For my 2c, at least, I'm not really partial to either approach, though
> I'd want to see what error messages end up looking like.  Seems like we
> might want to exercise a bit more control than we'd be able to if we
> transformed it directly into a SelectStmt (that is, we might add a HINT:
> roles with execute rights on pg_checkpoint() can run this command, or
> something; maybe not too tho).

I don't feel strongly one way or the other as well, but you have a
good point about extra control over the error messages.  The latest
patch just does a standard aclcheck_error(), so you'd probably see
"permission denied for function" if you didn't have privileges for
CHECKPOINT.  That could be confusing.

Nathan



Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 3:05 PM Peter Geoghegan  wrote:
> I think that 10 minutes is fine. But if it has to be 30 minutes, then
> that's also probably fine.

+1.

> I think that the case for enabling autovacuum logging is particularly
> good. The really big problems with autovacuum often involve
> anti-wraparound VACUUM, where you really want to have every possible
> opportunity to learn why VACUUM took much longer than expected.

+1.

> Going from doing index vacuuming in one single pass to requiring more
> than one pass can very significantly delay things, quite suddenly.
> Even when you have 95% of the maintenance_work_mem required to process
> indexes in a single pass, it might not be that much better than 50% or
> less. It's way too unpredictable in cases where users actually run
> into real problems -- cases where it actually matters.

Yeah. I have only very rarely run into cases where people actually end
up needing multiple passes, but it's always something I need to rule
out as part of the troubleshooting process, and it's hard to do that
without the log_autovacuum_min_duration output. It's pretty common for
me to see cases where, for example, the I/O performed by autovacuum
read a bunch of data off disk, which shoved a bunch of hot data out of
cache, and then performance tanked really hard. Or where the vacuum
cost limit is ridiculously low relative to the table size and the
vacuums take an unreasonably long time. In those kinds of cases what
you really need to know is that there was a vacuum on a certain table,
and how long it took, and when that happened.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make tuplestore helper function

2021-11-02 Thread Justin Pryzby
Several places have a conditional value for the first argument (randomAccess),
but your patch changes the behavior to a constant "true".  I didn't review the
patch beyond that.

> @@ -740,18 +724,14 @@ pg_prepared_statement(PG_FUNCTION_ARGS)
> - tupstore =
> - tuplestore_begin_heap(rsinfo->allowedModes & 
> SFRM_Materialize_Random,
> -   false, work_mem);

> @@ -2701,42 +2701,13 @@ pg_hba_file_rules(PG_FUNCTION_ARGS)
> - tuple_store =
> - tuplestore_begin_heap(rsi->allowedModes & 
> SFRM_Materialize_Random,
> -   false, work_mem);

> @@ -4799,31 +4797,8 @@ pg_timezone_names(PG_FUNCTION_ARGS)
> - randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
> - tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);

> @@ -575,38 +575,12 @@ pg_ls_dir_1arg(PG_FUNCTION_ARGS)
> - randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
> - tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);

> @@ -1170,17 +1154,12 @@ pg_cursor(PG_FUNCTION_ARGS)
> - tupstore =
> - tuplestore_begin_heap(rsinfo->allowedModes & 
> SFRM_Materialize_Random,
> -   false, work_mem);

> +++ b/src/backend/utils/fmgr/funcapi.c
> + tupstore = tuplestore_begin_heap(true, false, maxKBytes);

-- 
Justin




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Justin Pryzby
On Sun, Oct 31, 2021 at 10:59:19AM -0400, Tom Lane wrote:
> Bharath Rupireddy  writes:
> > How about we enable it out of the box?
> 
> No.
> 
> The general policy at the moment is that a normally-functioning server
> should emit *no* log traffic by default (other than a few messages
> at startup and shutdown).  log_checkpoints is a particularly poor
> candidate for an exception to that policy, because it would produce so
> much traffic.  No DBA would be likely to consider it as anything but
> log spam.

Huh?  As I see it, this is something that everyone should look at, at least
sometimes, and may be particularly important when deploying a new instance.  In
order to set shared_buffers/checkpoint_timeout/wal_size, it's important to know
what fraction of the buffers were dirty during a typical/peak checkpoint.  And
important to know if fsync is taking unreasonably large amount of time.

In cases that log_checkpoints writes "so much", then checkpoint_warning would
also be complaining, already.

I think writing a low volume of "routine" messages would allow some confidence
that a server is functioning normally.  If there's nothing being logged at all,
maybe logging isn't working, maybe I'm looking in the wrong place, or maybe the
server hasn't completed a checkpoint in the last 3 weeks and we're in deep
trouble - the absence of any logs at all does not support high confidence in
the health of one's cluster.

Note my somewhat recent message at
https://www.postgresql.org/message-id/20210615161830.gq31...@telsasoft.com:

On Tue, Jun 15, 2021 at 11:18:30AM -0500, Justin Pryzby wrote:
> I propose to change some defaults:
> 
> log_autovacuum_min_duration = 0
> log_checkpoints = on
> log_lock_waits = on (and log_recovery_conflict_waits too?)
> log_temp_files = 128kB

Robert opined that autovacuum shouldn't be zero, but shouldn't be disabled
either, which is fine.  Any value that doesn't completely disable logging these
events would be preferable - same for log_temp_files.




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Michael Banck
Hi,

On Tue, Nov 02, 2021 at 11:55:23AM -0400, Robert Haas wrote:
> I think shipping with log_checkpoints=on and
> log_autovacuum_min_duration=10m or so would be one of the best things
> we could possibly do to allow ex-post-facto troubleshooting of
> system-wide performance issues.

I don't disagree, but while we're at it, I'd like to throw
log_lock_waits into the ring as well. IME, once you get to a situation
where you have it pop up, the overall log volume usually dwarfs it, but
it's pretty important to possibly answer the "why was that query slow 5
days ago" question.


Michael

-- 
Michael Banck
Team Lead PostgreSQL
Project Manager
Tel.: +49 2166 9901-171
Mail: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Management: Dr. Michael Meskes, Geoff Richardson, Peter Lilley

Our handling of personal data is subject to:
https://www.credativ.de/en/contact/privacy/




Re: prevent immature WAL streaming

2021-11-02 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Oct-13, Andres Freund wrote:
>> Yea, let's go for your patch then. I've verified that at least locally it
>> passes under valgrind.

> Ah great, thanks.  Pushed then.

Seems like this hasn't fixed the problem: skink still fails on
this test occasionally.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-10-22%2013%3A52%3A00
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-11-02%2000%3A28%3A30

Both of these look like

#   Failed test '00010002 differs from 00010002'
#   at t/026_overwrite_contrecord.pl line 61.
# Looks like you failed 1 test of 3.
t/026_overwrite_contrecord.pl  
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests 

which looks like the same thing we were seeing before.
010e52337 seems to have just decreased the probability of failure.

regards, tom lane




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-11-02 Thread Melanie Plageman
v14 attached.

On Tue, Oct 19, 2021 at 3:29 PM Andres Freund  wrote:
>
>
> > > Is pgstattuple the best place for this helper? It's not really pgstatfuncs
> > > specific...
> > >
> > > It also looks vaguely familiar - I wonder if we have a helper roughly like
> > > this somewhere else already...
> > >
> >
> > I don't see a function which is specifically a utility to make a
> > tuplestore. Looking at the callers of tuplestore_begin_heap(), I notice
> > very similar code to the function I added in pg_tablespace_databases()
> > in utils/adt/misc.c, pg_stop_backup_v2() in xlogfuncs.c,
> > pg_event_trigger_dropped_objects() and pg_event_trigger_ddl_commands in
> > event_tigger.c, pg_available_extensions in extension.c, etc.
> >
> > Do you think it makes sense to refactor this code out of all of these
> > places?
>
> Yes, I think it'd make sense. We have about 40 copies of this stuff, which is
> fairly ridiculous.
>
>
> > If so, where would such a utility function belong?
>
> Not quite sure. src/backend/utils/fmgr/funcapi.c maybe? I suggest starting a
> separate thread about that...
>

done [1]. also, I dropped that commit from this patchset.

>
> > > > @@ -2999,6 +3036,14 @@ pgstat_shutdown_hook(int code, Datum arg)
> > > >  {
> > > >   Assert(!pgstat_is_shutdown);
> > > >
> > > > + /*
> > > > +  * Only need to send stats on IO Ops for IO Paths when a process 
> > > > exits, as
> > > > +  * pg_stat_get_buffers() will read from live backends' 
> > > > PgBackendStatus and
> > > > +  * then sum this with totals from exited backends persisted by 
> > > > the stats
> > > > +  * collector.
> > > > +  */
> > > > + pgstat_send_buffers();
> > > > +
> > > >   /*
> > > >* If we got as far as discovering our own database ID, we can 
> > > > report what
> > > >* we did to the collector.  Otherwise, we'd be sending an invalid
> > > > @@ -3092,6 +3137,30 @@ pgstat_send(void *msg, int len)
> > > >  #endif
> > > >  }
> > >
> > > I think it might be nicer to move pgstat_beshutdown_hook() to be a
> > > before_shmem_exit(), and do this in there.
> > >
> >
> > Not really sure the correct way to do this. A cursory attempt to do so
> > failed because ShutdownXLOG() is also registered as a
> > before_shmem_exit() and ends up being called after
> > pgstat_beshutdown_hook(). pgstat_beshutdown_hook() zeroes out
> > PgBackendStatus, ShutdownXLOG() initiates a checkpoint, and during a
> > checkpoint, the checkpointer increments IO op counter for writes in its
> > PgBackendStatus.
>
> I think we'll really need to do a proper redesign of the shutdown callback
> mechanism :(.
>

I've left what I originally had, then.

>
>
> > > > +PgBackendStatus *
> > > > +pgstat_fetch_backend_statuses(void)
> > > > +{
> > > > + return BackendStatusArray;
> > > > +}
> > >
> > > Hm, not sure this adds much?
> >
> > Is there a better way to access the whole BackendStatusArray from within
> > pgstatfuncs.c?
>
> Export the variable itself?
>

done but wasn't sure about PGDLLIMPORT

>
> > > IIRC Thomas Munro had a patch adding a nonatomic_add or such
> > > somewhere. Perhaps in the recovery readahead thread? Might be worth using
> > > instead?
> > >
> >
> > I've added Thomas' function in a separate commit. I looked for a better
> > place to add it (I was thinking somewhere in src/backend/utils/misc) but
> > couldn't find anywhere that made sense.
>
> I think it should just live in atomics.h?
>

done

-- melanie

[1] 
https://www.postgresql.org/message-id/flat/CAAKRu_azyd1Z3W_r7Ou4sorTjRCs%2BPxeHw1CWJeXKofkE6TuZg%40mail.gmail.com
From 41242606a03aace906e307a38fc67c5cefcaec20 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 11 Oct 2021 16:15:06 -0400
Subject: [PATCH v14 2/4] Read-only atomic backend write function

For counters in shared memory which can be read by any backend but only
written to by one backend, an atomic is still needed to protect against
torn values, however, pg_atomic_fetch_add_u64() is overkill for
incrementing the counter. inc_counter() is a helper function which can
be used to increment these values safely but without unnecessary
overhead.

Author: Thomas Munro
---
 src/include/port/atomics.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index 856338f161..09a2575d6a 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -519,6 +519,16 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
 	return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
 }
 
+/*
+ * On modern systems this is really just *counter++.  On some older systems
+ * there might be more to it, due to inability to read and write 64 bit values
+ * atomically.
+ */
+static inline void inc_counter(pg_atomic_uint64 *counter)
+{
+	pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
+}
+
 #undef INSIDE_ATOMICS_H
 
 #endif			/* ATOMICS_H */
-- 
2.30.2

From cfd8941958877a9bf3f5946e18aa098b1351d36f Mon 

Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Peter Geoghegan
On Tue, Nov 2, 2021 at 11:50 AM Robert Haas  wrote:
> I almost proposed 1m rather than 10m, but then I thought the better of
> it. I think it's unlikely that an autovacuum that takes 1 minute is
> really the cause of some big problem you're having on your system.
> Typical problem cases I see are hours or days long, so even 10 minutes
> is pretty short. compared to what's likely to cause you real issues.
> And at the same time 10 minutes is also high enough that you won't get
> frequent log messages.

I think that 10 minutes is fine. But if it has to be 30 minutes, then
that's also probably fine.

I think that the case for enabling autovacuum logging is particularly
good. The really big problems with autovacuum often involve
anti-wraparound VACUUM, where you really want to have every possible
opportunity to learn why VACUUM took much longer than expected.

Going from doing index vacuuming in one single pass to requiring more
than one pass can very significantly delay things, quite suddenly.
Even when you have 95% of the maintenance_work_mem required to process
indexes in a single pass, it might not be that much better than 50% or
less. It's way too unpredictable in cases where users actually run
into real problems -- cases where it actually matters.

-- 
Peter Geoghegan




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Nikolay Samokhvalov
On Tue, Nov 2, 2021 at 11:50 AM Robert Haas  wrote:

> I almost proposed 1m rather than 10m, but then I thought the better of
> it. I think it's unlikely that an autovacuum that takes 1 minute is
> really the cause of some big problem you're having on your system.
> Typical problem cases I see are hours or days long, so even 10 minutes
> is pretty short.
>

I'm talking about the autoANALYZE part, not VACUUM. In my case, it was a
few tables
~100GB-1TB in size, with 1-2 GIN indexes (with fastupdate, default pending
list size limit, 4MB),
10 workers with quite high bar in terms of throttling. And
default_statistics_target = 1000.
Observed autoANALYZE timing reached dozens of minutes, sometimes ~1 hour
for a table.
The problem is that, it looks, ANALYZE (unlike VACUUM) holds snapshot,
takes XID -- and it
all leads to the issues on standbys, if it takes so long. I'm going to post
the findings in a separate
thread, but the point is that autoANALYZE running minutes *may* cause big
performance issues.
That's why 1m seems a good threshold to me, even if leads to having 3 log
entries per minute from
3 workers. It's a quite low log traffic, but the data there is really
useful for retrospective analysis.


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 2:39 PM Nikolay Samokhvalov
 wrote:
> +1 for  log_checkpoints = on
> and +1 for log_autovacuum_min_duration = 1m or so.

I almost proposed 1m rather than 10m, but then I thought the better of
it. I think it's unlikely that an autovacuum that takes 1 minute is
really the cause of some big problem you're having on your system.
Typical problem cases I see are hours or days long, so even 10 minutes
is pretty short. compared to what's likely to cause you real issues.
And at the same time 10 minutes is also high enough that you won't get
frequent log messages. 1 minute might not be: with 3 autovacuum
workers by default, that could print out a message every 20 seconds,
which does not feel worthwhile. I think a 10 minute threshold is much
more likely to only capture events that you actually care about.

Now I do think that a lot of people would benefit from a lower setting
than 10 minutes, just to get more information about what's happening.
But IMHO, making the default as low as 1 minute is a bit much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Alvaro Herrera
On 2021-Nov-02, Nikolay Samokhvalov wrote:

> Back to checkpoint logging. With log_checkpoints = off, and high write
> activity with low max_wal_size we're already "spamming" the logs with
> lots of "checkpoints are occurring too frequently" – and this happens
> very often, any DBA running a restore process on Postgres with default
> max_wal_size (1GB, very low for modern DBs) saw it.
> 
> Without details, this definitely looks like "spam"

Speaking of which -- I think we could easily remove checkpoint_warning
without any loss of useful functionality.  Or, if we have to keep it, we
could change the way it works: when that condition triggers, then cause
regular "checkpoint started/complete" messages to occur (if they are
disabled), rather than the current "checkpoints are occurring too
frequently" message.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"E pur si muove" (Galileo Galilei)




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Alvaro Herrera
On 2021-Nov-02, Robert Haas wrote:

> I think shipping with log_checkpoints=on and
> log_autovacuum_min_duration=10m or so would be one of the best things
> we could possibly do to allow ex-post-facto troubleshooting of
> system-wide performance issues. The idea that users care more about
> the inconvenience of a handful of extra log messages than they do
> about being able to find problems when they have them matches no part
> of my experience.

I agree.

There are things that are much more likely to be unhelpful and
irritating -- say, enabling log_connections by default.  Such messages
would be decididly useless for a large fraction of users and a burden.
That's not something you can claim about checkpoints and large-autovac
messages, though; not only because they are much less frequent, but also
because each line concisely represents a large amount of work.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Nikolay Samokhvalov
On Tue, Nov 2, 2021 at 8:55 AM Robert Haas  wrote:

> I think shipping with log_checkpoints=on and
> log_autovacuum_min_duration=10m or so would be one of the best things
> we could possibly do to allow ex-post-facto troubleshooting of
> system-wide performance issues.
>

Fully agree, it would be really great. Glad that you added autovacuum
logging
into the picture.

Maybe, 1m would be better – I recently observed a system with
high TPS, where autoANALYZE took very long for a system, causing
huge slowdown on standbys, starting after a few minutes of ANALYZE launched.
Correlation and then causation was confirmed only thanks to
log_autovacuum_min_duration configured -- without it, no one would be
able to understand what happened.

Back to checkpoint logging. With log_checkpoints = off, and high write
activity
with low max_wal_size we're already "spamming" the logs with lots of
"checkpoints are occurring too frequently" – and this happens very often,
any DBA running a restore process on Postgres with default max_wal_size
(1GB, very low for modern DBs) saw it.

Without details, this definitely looks like "spam" – and it's already
happening
here and there. Details provided by  log_checkpoints = on will give
something
leading to making the decision on max_wal_size reconfiguration based on
data,
not guesswork.

+1 for  log_checkpoints = on
and +1 for log_autovacuum_min_duration = 1m or so.


Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber

2021-11-02 Thread Tom Lane
Pushed with some adjustment of the comments.  I also simplified the
datestyle setting to just "ISO", because that's sufficient: that
DateStyle doesn't care about DateOrder.  Since the settings are
supposed to match what pg_dump uses, it's just confusing if they don't.

Also, I didn't commit the test case.  It was useful for development,
but it seemed entirely too expensive to keep forevermore compared to its
likely future value.  It increased the runtime of 100_bugs.pl by about
a third, and I'm afraid the likely future value is nil.  The most likely
bug in this area would be introducing some new GUC that we need to set
and forgetting to do so here; but this test case could not expose that.

regards, tom lane




Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-11-02 Thread Andres Freund
On 2021-11-02 13:43:46 -0400, Tom Lane wrote:
> Ranier Vilela  writes:
> > It seems that 1ec7679f1b67e84be688a311dce234eeaa1d5de8 caused the problem.
> 
> Indeed.  Fix pushed.

Thanks to both of you!




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Tue, 2021-11-02 at 11:06 -0400, Robert Haas wrote:
> > Just as a sort of general comment on this endeavor, I suspect that
> > any
> > attempt to lump things together that seem closely related is doomed
> > to
> > backfire.
> 
> Agreed, I think that is apparent from the different opinions in this
> thread.
> 
> Robert had a good idea over here though:

Think you meant 'Stephen' there. ;)

> https://postgr.es/m/20211101165025.gs20...@tamriel.snowman.net
> 
> which gives fine-grained control without the "clutter" of extra
> predefined roles.

Right.

* Bossart, Nathan (bossa...@amazon.com) wrote:
> On 11/2/21, 10:29 AM, "Jeff Davis"  wrote:
> > Great idea! Patch attached.
> >
> > This feels like a good pattern that we might want to use elsewhere, if
> > the need arises.
> 
> The approach in the patch looks alright to me, but another one could
> be to build a SelectStmt when parsing CHECKPOINT.  I think that'd
> simplify the standard_ProcessUtility() changes.

For my 2c, at least, I'm not really partial to either approach, though
I'd want to see what error messages end up looking like.  Seems like we
might want to exercise a bit more control than we'd be able to if we
transformed it directly into a SelectStmt (that is, we might add a HINT:
roles with execute rights on pg_checkpoint() can run this command, or
something; maybe not too tho).

> Otherwise, I see a couple of warnings when compiling:
> xlogfuncs.c:54: warning: implicit declaration of function 
> ‘RequestCheckpoint’
> xlogfuncs.c:56: warning: control reaches end of non-void function

Yeah, such things would need to be cleaned up, of course.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Jan Wieck

On 11/2/21 12:09, Peter Geoghegan wrote:

On Tue, Nov 2, 2021 at 8:55 AM Robert Haas  wrote:

I think shipping with log_checkpoints=on and
log_autovacuum_min_duration=10m or so would be one of the best things
we could possibly do to allow ex-post-facto troubleshooting of
system-wide performance issues. The idea that users care more about
the inconvenience of a handful of extra log messages than they do
about being able to find problems when they have them matches no part
of my experience. I suspect that many users would be willing to incur
*way more* useless log messages than those settings would ever
generate if it meant that they could actually find problems when and
if they have them.


I fully agree.



+1

--
Jan Wieck




Re: make tuplestore helper function

2021-11-02 Thread Alvaro Herrera
On 2021-Nov-02, Melanie Plageman wrote:

> Attached is a patch to create a helper function which creates a
> tuplestore to be used by FUNCAPI-callable functions.

Looks good, and given the amount of code being removed, it seems a
no-brainer that some helper(s) is/are needed.

I think the name is a bit too generic.  How about
MakeFuncResultTuplestore()?

I think the function should not modify rsinfo -- having that as
(undocumented) side effect is weird.  I would suggest to just return the
tuplestore, and have the caller stash it in rsinfo->setResult and modify
the other struct members alongside that.  It's a couple more lines per
caller, but the code is clearer that way.

> There are a few places with very similar code to the new helper
> (MakeTuplestore()) but which, for example, don't expect the return type
> to be a row (pg_ls_dir(), pg_tablespace_databases(), text_to_table()). I
> wasn't sure if it was worth modifying it for their benefit.

Is this just because of the tupdesc?  If that's the case, then maybe the
tupdesc can be optionally passed in by caller, and when it's not, the
helper does get_call_result_type() and the tupdesc is used as output
argument.

> I wasn't sure if the elog() check for call result type should be changed
> to an ereport().

I don't see a reason for that: it's still an internal (not user-facing) error.

> All callers of MakeTuplestore() pass work_mem as the third parameter, so
> I could just omit it and hard-code it in, but I wasn't sure that felt
> right in a utility/helper function.

No opinion.  I would have used the global in the function instead of
passing it in.

> I inlined deflist_to_tuplestore() in foreign.c since it was billed as a
> helper function but wasn't used elsewhere and it made it harder to use
> MakeTuplestore() in this location.

This is a bit strange.  Does it work to pass rsinfo->expectedDesc?

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 10:29 AM, "Jeff Davis"  wrote:
> Great idea! Patch attached.
>
> This feels like a good pattern that we might want to use elsewhere, if
> the need arises.

The approach in the patch looks alright to me, but another one could
be to build a SelectStmt when parsing CHECKPOINT.  I think that'd
simplify the standard_ProcessUtility() changes.

Otherwise, I see a couple of warnings when compiling:
xlogfuncs.c:54: warning: implicit declaration of function 
‘RequestCheckpoint’
xlogfuncs.c:56: warning: control reaches end of non-void function

Nathan



Re: Eval expression R/O once time (src/backend/executor/execExpr.c)

2021-11-02 Thread Tom Lane
Ranier Vilela  writes:
> It seems that 1ec7679f1b67e84be688a311dce234eeaa1d5de8 caused the problem.

Indeed.  Fix pushed.

regards, tom lane




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Jeff Davis
On Tue, 2021-11-02 at 11:06 -0400, Robert Haas wrote:
> Just as a sort of general comment on this endeavor, I suspect that
> any
> attempt to lump things together that seem closely related is doomed
> to
> backfire.

Agreed, I think that is apparent from the different opinions in this
thread.

Robert had a good idea over here though:

https://postgr.es/m/20211101165025.gs20...@tamriel.snowman.net

which gives fine-grained control without the "clutter" of extra
predefined roles.

Regards,
Jeff Davis






Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread David Steele

On 11/2/21 12:35 PM, Julien Rouhaud wrote:
Le mer. 3 nov. 2021 à 00:18, Andrew Dunstan > a écrit :



On 11/2/21 12:09, Peter Geoghegan wrote:
 > On Tue, Nov 2, 2021 at 8:55 AM Robert Haas mailto:robertmh...@gmail.com>> wrote:
 >> I think shipping with log_checkpoints=on and
 >> log_autovacuum_min_duration=10m or so would be one of the best
things
 >> we could possibly do to allow ex-post-facto troubleshooting of
 >> system-wide performance issues. The idea that users care more about
 >> the inconvenience of a handful of extra log messages than they do
 >> about being able to find problems when they have them matches no
part
 >> of my experience. I suspect that many users would be willing to
incur
 >> *way more* useless log messages than those settings would ever
 >> generate if it meant that they could actually find problems when and
 >> if they have them.
 > I fully agree.


/metoo


same here


+1

--
-David
da...@pgmasters.net




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Jeff Davis
On Mon, 2021-11-01 at 12:50 -0400, Stephen Frost wrote:
> All that said, I wonder if we can have our cake and eat it too.  I
> haven't looked into this at all yet and perhaps it's foolish on its
> face, but, could we make CHECKPOINT; basically turn around and just
> run
> select pg_checkpoint(); with the regular privilege checking
> happening?
> Then we'd keep the existing syntax working, but if the user is
> allowed
> to run the command would depend on if they've been GRANT'd EXECUTE
> rights on the function or not.

Great idea! Patch attached.

This feels like a good pattern that we might want to use elsewhere, if
the need arises.

Regards,
Jeff Davis

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b49dff2ffc..1e3152c39b1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25487,6 +25487,23 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
  
 
  
+  
+   
+
+ pg_checkpoint
+
+pg_checkpoint ()
+   
+   
+Request an immediate checkpoint. This function implements the  command, so the behavior is identical. This
+function is restricted to superusers by default, but other users can
+be granted EXECUTE to run the function. If a user has permission to
+execute this function, they also have permission to issue a
+CHECKPOINT command.
+   
+  
+
   

 
diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 2afee6d7b59..ec2c1a62050 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -52,7 +52,10 @@ CHECKPOINT
   
 
   
-   Only superusers can call CHECKPOINT.
+   By default, only superusers can call CHECKPOINT, but
+   permission can be granted to other users by granting privileges on the
+   pg_checkpoint()
+   function.
   
  
 
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index b98deb72ec6..7ecaca47885 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -44,6 +44,17 @@
 static StringInfo label_file;
 static StringInfo tblspc_map_file;
 
+/*
+ * Implements the CHECKPOINT command. To allow non-superusers to perform the
+ * CHECKPOINT command, grant privileges on this function.
+ */
+Datum
+pg_checkpoint(PG_FUNCTION_ARGS)
+{
+	RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
+	  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
+}
+
 /*
  * pg_start_backup: set up for taking an on-line backup dump
  *
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 54c93b16c4c..4437eb3010b 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -603,6 +603,8 @@ AS 'unicode_is_normalized';
 -- available to superuser / cluster owner, if they choose.
 --
 
+REVOKE EXECUTE ON FUNCTION pg_checkpoint() FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stop_backup() FROM public;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index bf085aa93b2..cb55544d7be 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -24,6 +24,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/alter.h"
@@ -67,6 +68,7 @@
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
+#include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -939,13 +941,29 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			break;
 
 		case T_CheckPointStmt:
-			if (!superuser())
-ereport(ERROR,
-		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		 errmsg("must be superuser to do CHECKPOINT")));
+			{
+/*
+ * Invoke pg_checkpoint(). Implementing the CHECKPOINT command
+ * with a function allows administrators to grant privileges
+ * on the CHECKPOINT command by granting privileges on the
+ * pg_checkpoint() function. It also calls the function
+ * execute hook, if present.
+ */
+AclResult	aclresult;
+FmgrInfo	flinfo;
+
+aclresult = pg_proc_aclcheck(F_PG_CHECKPOINT, GetUserId(),
+			 ACL_EXECUTE);
+if (aclresult != ACLCHECK_OK)
+	aclcheck_error(aclresult, OBJECT_FUNCTION,
+   get_func_name(F_PG_CHECKPOINT));
 
-			RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
-			  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
+InvokeFunctionExecuteHook(F_PG_CHECKPOINT);
+
+fmgr_info(F_PG_CHECKPOINT, );
+
+(void) FunctionCall0Coll(, InvalidOid);
+			}
 			break;
 
 		case T_ReindexStmt:
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 

Re: archive modules

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 9:46 AM, "Robert Haas"  wrote:
> On Tue, Nov 2, 2021 at 12:39 PM Bossart, Nathan  wrote:>
>> On 11/2/21, 9:17 AM, "Robert Haas"  wrote:
>> You could still introduce GUCs in _PG_init(), but they couldn't be
>> defined as PGC_POSTMASTER.
>
> It seems like PGC_POSTMASTER isn't very desirable anyway. Wouldn't you
> want PGC_SIGHUP? I mean I'm not saying there couldn't be a case where
> that wouldn't work, because you could need a big chunk of shared
> memory allocated at startup time or something. But in that's probably
> not typical, and if it does happen well then that particular archive
> module has to be preloaded. If you know that you have several archive
> modules that you want to use, you can still preload them all if for
> this or any other reason you want to do that. But in a lot of cases
> it's not going to be necessary.
>
> In other words, if we use hooks, then you're guaranteed to need a
> server restart to change anything. If we use something like what you
> have now, there can be corner cases where you need that or benefits of
> preloading, but it's not a hard requirement, and in many cases you can
> get by without it. That seems strictly better to me ... but maybe I'm
> still confused.
>
>> Also, you could still use
>> RegisterDynamicBackgroundWorker() to register a background worker, but
>> you couldn't use RegisterBackgroundWorker().  These might be
>> acceptable restrictions if swapping archive libraries on the fly seems
>> more important, but I wanted to bring that front and center to make
>> sure everyone understands the tradeoffs.
>
> RegisterDynamicBackgroundWorker() seems way better, though. It's hard
> for me to understand why this would be a problem for anybody. And
> again, if somebody does have that need, they can always fall back to
> saying that their particular module should be preloaded if you want to
> use it.

I agree.  I'll make sure the archive library can be changed via SIGHUP
in the next revision.

Nathan



Re: Add additional information to src/test/ssl/README

2021-11-02 Thread Kevin Burke
The patch looks great. Thanks!

Kevin

On Sun, Oct 31, 2021 at 3:13 PM Tom Lane  wrote:

> Daniel Gustafsson  writes:
> > On 31 Oct 2021, at 19:28, Tom Lane  wrote:
> >> Here's a quick stab at rearranging src/test/perl/README so that the
> >> initial section is all how-to-run-the-tests info, and advice about
> >> writing new tests starts after that.  Your point about PG_TEST_NOCLEAN
> >> could be added into the initial section.
>
> > That's pretty much just what I was thinking, definitely +1 on this patch.
>
> Done that way; feel free to add more material to perl/README.
>
> regards, tom lane
>


Re: archive modules

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 12:39 PM Bossart, Nathan  wrote:>
> On 11/2/21, 9:17 AM, "Robert Haas"  wrote:
> You could still introduce GUCs in _PG_init(), but they couldn't be
> defined as PGC_POSTMASTER.

It seems like PGC_POSTMASTER isn't very desirable anyway. Wouldn't you
want PGC_SIGHUP? I mean I'm not saying there couldn't be a case where
that wouldn't work, because you could need a big chunk of shared
memory allocated at startup time or something. But in that's probably
not typical, and if it does happen well then that particular archive
module has to be preloaded. If you know that you have several archive
modules that you want to use, you can still preload them all if for
this or any other reason you want to do that. But in a lot of cases
it's not going to be necessary.

In other words, if we use hooks, then you're guaranteed to need a
server restart to change anything. If we use something like what you
have now, there can be corner cases where you need that or benefits of
preloading, but it's not a hard requirement, and in many cases you can
get by without it. That seems strictly better to me ... but maybe I'm
still confused.

> Also, you could still use
> RegisterDynamicBackgroundWorker() to register a background worker, but
> you couldn't use RegisterBackgroundWorker().  These might be
> acceptable restrictions if swapping archive libraries on the fly seems
> more important, but I wanted to bring that front and center to make
> sure everyone understands the tradeoffs.

RegisterDynamicBackgroundWorker() seems way better, though. It's hard
for me to understand why this would be a problem for anybody. And
again, if somebody does have that need, they can always fall back to
saying that their particular module should be preloaded if you want to
use it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: archive modules

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 9:17 AM, "Robert Haas"  wrote:
> On Tue, Nov 2, 2021 at 12:10 PM Bossart, Nathan  wrote:
>> Yes, that seems doable.  My point is that I've intentionally chosen to
>> preload the libraries at the moment so that it's possible to define
>> PGC_POSTMASTER GUCs and to use RegisterBackgroundWorker().  If we
>> think that switching archive modules without restarting is more
>> important, I believe we will need to take on a few restrictions.
>
> I guess I'm failing to understand what the problem is. You can set
> GUCs of the form foo.bar in postgresql.conf anyway, right?

I must not be explaining it well, sorry.  I'm mainly thinking about
the following code snippets.

In guc.c:
/*
 * Only allow custom PGC_POSTMASTER variables to be created during 
shared
 * library preload; any later than that, we can't ensure that the value
 * doesn't change after startup.  This is a fatal elog if it happens; 
just
 * erroring out isn't safe because we don't know what the calling 
loadable
 * module might already have hooked into.
 */
if (context == PGC_POSTMASTER &&
!process_shared_preload_libraries_in_progress)
elog(FATAL, "cannot create PGC_POSTMASTER variables after 
startup");

In bgworker.c:
if (!process_shared_preload_libraries_in_progress &&
strcmp(worker->bgw_library_name, "postgres") != 0)
{
if (!IsUnderPostmaster)
ereport(LOG,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("background worker \"%s\": must 
be registered in shared_preload_libraries",
worker->bgw_name)));
return;
}

You could still introduce GUCs in _PG_init(), but they couldn't be
defined as PGC_POSTMASTER.  Also, you could still use
RegisterDynamicBackgroundWorker() to register a background worker, but
you couldn't use RegisterBackgroundWorker().  These might be
acceptable restrictions if swapping archive libraries on the fly seems
more important, but I wanted to bring that front and center to make
sure everyone understands the tradeoffs.

It's also entirely possible I'm misunderstanding something here...

Nathan



Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Julien Rouhaud
Le mer. 3 nov. 2021 à 00:18, Andrew Dunstan  a écrit :

>
> On 11/2/21 12:09, Peter Geoghegan wrote:
> > On Tue, Nov 2, 2021 at 8:55 AM Robert Haas 
> wrote:
> >> I think shipping with log_checkpoints=on and
> >> log_autovacuum_min_duration=10m or so would be one of the best things
> >> we could possibly do to allow ex-post-facto troubleshooting of
> >> system-wide performance issues. The idea that users care more about
> >> the inconvenience of a handful of extra log messages than they do
> >> about being able to find problems when they have them matches no part
> >> of my experience. I suspect that many users would be willing to incur
> >> *way more* useless log messages than those settings would ever
> >> generate if it meant that they could actually find problems when and
> >> if they have them.
> > I fully agree.
>
>
> /metoo
>

same here

>


make tuplestore helper function

2021-11-02 Thread Melanie Plageman
Attached is a patch to create a helper function which creates a
tuplestore to be used by FUNCAPI-callable functions.

It was suggested in [1] that I start a separate thread for this patch.

A few notes:

There are a few places with very similar code to the new helper
(MakeTuplestore()) but which, for example, don't expect the return type
to be a row (pg_ls_dir(), pg_tablespace_databases(), text_to_table()). I
wasn't sure if it was worth modifying it for their benefit.

I wasn't sure if the elog() check for call result type should be changed
to an ereport().

All callers of MakeTuplestore() pass work_mem as the third parameter, so
I could just omit it and hard-code it in, but I wasn't sure that felt
right in a utility/helper function.

I inlined deflist_to_tuplestore() in foreign.c since it was billed as a
helper function but wasn't used elsewhere and it made it harder to use
MakeTuplestore() in this location.

-- melanie

[1] 
https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
From 69e41c23590637c171e985c700b8cfececbbf1f2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Tue, 2 Nov 2021 12:20:55 -0400
Subject: [PATCH v1] Add helper to make tuplestore

And use it to refactor out existing duplicate code.
---
 src/backend/access/transam/xlogfuncs.c | 27 +--
 src/backend/commands/event_trigger.c   | 56 +--
 src/backend/commands/extension.c   | 82 +-
 src/backend/commands/prepare.c | 34 ++---
 src/backend/foreign/foreign.c  | 53 +++---
 src/backend/libpq/hba.c| 31 +---
 src/backend/replication/logical/launcher.c | 27 +--
 src/backend/replication/logical/origin.c   | 25 +--
 src/backend/replication/slotfuncs.c| 27 +--
 src/backend/replication/walsender.c| 27 +--
 src/backend/storage/ipc/shmem.c| 27 +--
 src/backend/utils/adt/datetime.c   | 27 +--
 src/backend/utils/adt/genfile.c| 28 +---
 src/backend/utils/adt/mcxtfuncs.c  | 29 +---
 src/backend/utils/adt/pgstatfuncs.c| 80 +
 src/backend/utils/fmgr/funcapi.c   | 34 +
 src/backend/utils/misc/guc.c   | 26 +--
 src/backend/utils/mmgr/portalmem.c | 31 +---
 src/include/funcapi.h  |  1 +
 19 files changed, 72 insertions(+), 600 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index b98deb72ec..9b87296f2b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -165,11 +165,8 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 Datum
 pg_stop_backup_v2(PG_FUNCTION_ARGS)
 {
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
 	Datum		values[3];
 	bool		nulls[3];
 
@@ -178,29 +175,7 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 	XLogRecPtr	stoppoint;
 	SessionBackupState status = get_backup_status();
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
-	MemoryContextSwitchTo(oldcontext);
+	tupstore = MakeTuplestore(fcinfo, , work_mem);
 
 	MemSet(values, 0, sizeof(values));
 	MemSet(nulls, 0, sizeof(nulls));
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index df264329d8..3ea5a9b851 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -1289,11 +1289,8 @@ EventTriggerSQLDropAddObject(const ObjectAddress *object, bool original, bool no
 Datum
 pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
 {
-	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
 	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
 	slist_iter	iter;
 
 	/*
@@ -1306,30 +1303,7 @@ pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
  errmsg("%s can only be called in a sql_drop event trigger function",
 		"pg_event_trigger_dropped_objects()")));
 
-	/* check to see if 

Re: refactoring basebackup.c

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 10:32 AM Robert Haas  wrote:
> Looks pretty good. I think you should work on stuff like documentation
> and tests, and I need to do some work on that stuff, too. Also, I
> think you should try to figure out how to support different
> compression levels.

On second thought, maybe we don't need to do this. There's a thread on
"Teach pg_receivewal to use lz4 compression" which concluded that
supporting different compression levels was unnecessary.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander  wrote:
> Um, why?
>
> That we are using zlib to provide the compression is an implementation 
> detail. Whereas AFAIK "gzip" refers to both the program and the format. And 
> we specifically use the gzxxx() functions in zlib, in order to produce gzip 
> format.
>
> I think for the end user, it is strictly better to name it "gzip", and given 
> that the target of this option is the end user we should do so. (It'd be 
> different it we were talking about a build-time parameter to configure).

I agree. Also, I think there's actually a file format called "zlib"
which is slightly different from the "gzip" format, and you have to be
careful not to generate the wrong one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Andrew Dunstan


On 11/2/21 12:09, Peter Geoghegan wrote:
> On Tue, Nov 2, 2021 at 8:55 AM Robert Haas  wrote:
>> I think shipping with log_checkpoints=on and
>> log_autovacuum_min_duration=10m or so would be one of the best things
>> we could possibly do to allow ex-post-facto troubleshooting of
>> system-wide performance issues. The idea that users care more about
>> the inconvenience of a handful of extra log messages than they do
>> about being able to find problems when they have them matches no part
>> of my experience. I suspect that many users would be willing to incur
>> *way more* useless log messages than those settings would ever
>> generate if it meant that they could actually find problems when and
>> if they have them.
> I fully agree.


/metoo


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: archive modules

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 12:10 PM Bossart, Nathan  wrote:
> Yes, that seems doable.  My point is that I've intentionally chosen to
> preload the libraries at the moment so that it's possible to define
> PGC_POSTMASTER GUCs and to use RegisterBackgroundWorker().  If we
> think that switching archive modules without restarting is more
> important, I believe we will need to take on a few restrictions.

I guess I'm failing to understand what the problem is. You can set
GUCs of the form foo.bar in postgresql.conf anyway, right?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: archive modules

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 8:57 AM, "Robert Haas"  wrote:
> On Tue, Nov 2, 2021 at 11:26 AM Bossart, Nathan  wrote:
>> Well, the current patch does require a reload since the modules are
>> preloaded, but maybe there is some way to avoid that.
>
> I think we could set things up so that if the value changes, you call
> a shutdown hook for the old library, load the new one, and call any
> startup hook for that one.

Yes, that seems doable.  My point is that I've intentionally chosen to
preload the libraries at the moment so that it's possible to define
PGC_POSTMASTER GUCs and to use RegisterBackgroundWorker().  If we
think that switching archive modules without restarting is more
important, I believe we will need to take on a few restrictions.

Nathan



Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Peter Geoghegan
On Tue, Nov 2, 2021 at 8:55 AM Robert Haas  wrote:
> I think shipping with log_checkpoints=on and
> log_autovacuum_min_duration=10m or so would be one of the best things
> we could possibly do to allow ex-post-facto troubleshooting of
> system-wide performance issues. The idea that users care more about
> the inconvenience of a handful of extra log messages than they do
> about being able to find problems when they have them matches no part
> of my experience. I suspect that many users would be willing to incur
> *way more* useless log messages than those settings would ever
> generate if it meant that they could actually find problems when and
> if they have them.

I fully agree.

-- 
Peter Geoghegan




Re: archive modules

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 11:26 AM Bossart, Nathan  wrote:
> Well, the current patch does require a reload since the modules are
> preloaded, but maybe there is some way to avoid that.

I think we could set things up so that if the value changes, you call
a shutdown hook for the old library, load the new one, and call any
startup hook for that one.

That wouldn't work with a hook-based approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] fix references to like_regex

2021-11-02 Thread Gilles Darold

Le 02/11/2021 à 16:50, Tom Lane a écrit :

Gilles Darold  writes:

Since we have the regexp_like operator I have found that there is two
references in the documentation about PostgreSQL lacking of LIKE_REGEX
implementation. Here is a patch to fix the documentation. I simply
remove the reference to non exist of LIKE_REGEX in PostgreSQL in chapter
"9.7.3.8 Differences from XQuery"  And try to modify chapter "9.16.2.3.
SQL/JSON Regular Expressions" to mention the REGEXP_LIKE operator. For
the second fix there should be better wording.

I don't think we should change these (yet).  regexp_like() is *not*
LIKE_REGEX, precisely because it's using a slightly different
regular-expression language than what the spec calls for.
At some point we may provide a skin for the regexp engine that
duplicates the spec's definition, and then we can implement
LIKE_REGEX for real.



Thanks for clarifying, I thought it was an oversight.


Regards

--
Gilles Darold



Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Robert Haas
On Sun, Oct 31, 2021 at 10:59 AM Tom Lane  wrote:
> The general policy at the moment is that a normally-functioning server
> should emit *no* log traffic by default (other than a few messages
> at startup and shutdown).  log_checkpoints is a particularly poor
> candidate for an exception to that policy, because it would produce so
> much traffic.  No DBA would be likely to consider it as anything but
> log spam.

That's absolutely false. On any system where there's anything actually
happening, there's going to be tons of stuff in the log because there
are going to be failed connection attempts, queries that result in
errors, and all kinds of other things like that. By any reasonable
metric, the log volume of log_checkpoints=on is tiny. It can't log
anything more than once per checkpoint interval, which means you're
not even talking about 1 message per minute. You can easily have
hundreds or thousands of errors or other log messages from user
activity *per second* and even on a relatively quiet system that stuff
is going to completely dwarf what you get from log_checkpoints. If
log_checkpoints=on is your biggest source of log output, what you need
isn't so much 'log_checkpoints=off' as 'pg_ctl stop'.

> > It seems the checkpoint stats, that are emitted to server logs when
> > the GUC log_checkpoints is enabled, are so important that a postgres
> > database provider would ever want to disable the GUC.
>
> This statement seems ridiculous on its face.  If users need to wait
> with bated breath for a checkpoint report, we have something else
> we need to fix.

Besides appearing to be unwarranted mockery of what Bharath wrote,
this statement also seems to reflect a complete lack of understanding
of what is involved with maintaining a production system. Most of the
time, things just work, so you don't need to look at the logs at all.
But when things do go wrong, then you need some way to figure out what
the problem is. System-wide performance problems not linked to an
individual query are most often caused by either of two things:
checkpoints, and big autovacuum jobs. If you've set up logging for
those things in advance, then if and when the problem happens, you
will have a chance of being able to understand it and solve it
quickly. If you have not, the very first piece of advice you're going
to get from me is to (1) check whether there are any autovacuum
workers still running and if so figure out what they're doing and (2)
set log_checkpoints=on and log_autovacuum_min_duration to the smallest
value that isn't going to fill up your logs with too much garbage.

I can tell you from long experience with this sort of situation that
users do not love it. It means that they often cannot troubleshoot
problems of this type that happened in the past, because there's
simply no information in the log file that is of any use, and they
have to wait for the problem to recur... and I don't think it is
overstating anything to say that some of them probably DO wait with
bated breath, because they'd like whatever the issue is to get fixed!

I think shipping with log_checkpoints=on and
log_autovacuum_min_duration=10m or so would be one of the best things
we could possibly do to allow ex-post-facto troubleshooting of
system-wide performance issues. The idea that users care more about
the inconvenience of a handful of extra log messages than they do
about being able to find problems when they have them matches no part
of my experience. I suspect that many users would be willing to incur
*way more* useless log messages than those settings would ever
generate if it meant that they could actually find problems when and
if they have them. And these messages would in fact be the most
valuable thing in the log for a lot of users. What reasonable DBA
cares more about the fact that the application attempted an insert
that violated a foreign key constraint than they do about a checkpoint
that took 20 minutes to fsync everything? The former is expected; if
you thought that foreign key violations would never occur, you
wouldn't need to incur the expense of having the system enforce them.
The latter is unexpected and basically undiscoverable with default
settings.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] fix references to like_regex

2021-11-02 Thread Tom Lane
Gilles Darold  writes:
> Since we have the regexp_like operator I have found that there is two 
> references in the documentation about PostgreSQL lacking of LIKE_REGEX 
> implementation. Here is a patch to fix the documentation. I simply 
> remove the reference to non exist of LIKE_REGEX in PostgreSQL in chapter 
> "9.7.3.8 Differences from XQuery"  And try to modify chapter "9.16.2.3. 
> SQL/JSON Regular Expressions" to mention the REGEXP_LIKE operator. For 
> the second fix there should be better wording.

I don't think we should change these (yet).  regexp_like() is *not*
LIKE_REGEX, precisely because it's using a slightly different
regular-expression language than what the spec calls for.
At some point we may provide a skin for the regexp engine that
duplicates the spec's definition, and then we can implement
LIKE_REGEX for real.

regards, tom lane




Re: inefficient loop in StandbyReleaseLockList()

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 8:36 AM, "Tom Lane"  wrote:
> I've pushed the SyncPostCheckpoint change, and I think I'm going
> to stop here.  It's not clear that the remaining list_delete_first
> callers have any real problem; and changing them would be complex.
> We can revisit the question if we find out there is an issue.
> Or, if somebody else wants to pursue the issue, feel free.

Thanks!

Nathan



Re: inefficient loop in StandbyReleaseLockList()

2021-11-02 Thread Tom Lane
I've pushed the SyncPostCheckpoint change, and I think I'm going
to stop here.  It's not clear that the remaining list_delete_first
callers have any real problem; and changing them would be complex.
We can revisit the question if we find out there is an issue.
Or, if somebody else wants to pursue the issue, feel free.

regards, tom lane




Re: archive modules

2021-11-02 Thread Bossart, Nathan
On 11/2/21, 8:11 AM, "Robert Haas"  wrote:
> Why in the world would you want a plain hook rather than something
> closer to the way logical replication works?
>
> Plain hooks are annoying to use. If you load things at the wrong time,
> it silently doesn't work. It's also impossible to unload anything. If
> you want to change to a different module, you probably have to bounce
> the whole server instead of just changing the GUC and letting it load
> the new module when you run 'pg_ctl reload'.

Well, the current patch does require a reload since the modules are
preloaded, but maybe there is some way to avoid that.  However, I
agree with the general argument that a dedicated archive module
interface will be easier to use correctly.  With a hook, it could be
hard to know which library's archive function we are actually using.
With archive_library, we know the exact library's callback functions
we are using.

I think an argument for just adding a hook is that it's simpler and
easier to maintain.  But continuous archiving is a pretty critical
piece of functionality, so I think it's a great candidate for some
sturdy infrastructure.

Nathan



Re: Non-superuser subscription owners

2021-11-02 Thread Robert Haas
On Mon, Nov 1, 2021 at 6:44 PM Mark Dilger  wrote:
> > ALTER SUBSCRIPTION..[ENABLE | DISABLE] do not synchronously start or stop 
> > subscription workers.  The ALTER command updates the catalog's subenabled 
> > field, but workers only lazily respond to that.  Disabling and enabling the 
> > subscription as part of the OWNER TO would not reliably accomplish anything.
>
> Having discussed this with Andrew off-list, we've concluded that updating the 
> documentation for logical replication to make this point more clear is 
> probably sufficient, but I wonder if anyone thinks otherwise?

The question in my mind is whether there's some reasonable amount of
time that a user should expect to have to wait for the changes to take
effect. If it could easily happen that the old permissions are still
in use a month after the change is made, I think that's probably not
good. If there's reason to think that, barring unusual circumstances,
changes will be noticed within a few minutes, I think that's fine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: archive modules

2021-11-02 Thread Bossart, Nathan
I've just realized I forgot to CC the active participants on the last
thread to this one, so I've attempted to do that now.  I didn't
intentionally leave anyone out, but I'm sorry if I missed someone.

On 11/1/21, 10:24 PM, "Michael Paquier"  wrote:
> It seems to me that this patch is not moving into the right direction
> implementation-wise (I have read the arguments about
> backward-compatibility that led to the introduction of archive_library
> and its shell mode), for what looks like a duplicate of
> shared_preload_libraries but for an extra code path dedicated to the
> archiver, where we could just have a hook instead?  We have been
> talking for some time now to make the archiver process more 
> bgworker-ish, so as we finish with something closer to what the 
> logical replication launcher is.

Hm.  It sounds like you want something more like what was discussed
earlier in the other thread [0].  This approach would basically just
add a hook and call it a day.  My patch for this approach [1] moved
the shell archive logic to a test module, but the general consensus
seems to be that we'd need to have it be present in core (and the
default).

Nathan

[0] https://postgr.es/m/8B7BF404-29D4-4662-A2DF-1AC4C98463EB%40amazon.com
[1] 
https://postgr.es/m/attachment/127385/v2-0001-Replace-archive_command-with-a-hook.patch



Re: archive modules

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 1:24 AM Michael Paquier  wrote:
> It seems to me that this patch is not moving into the right direction
> implementation-wise (I have read the arguments about
> backward-compatibility that led to the introduction of archive_library
> and its shell mode), for what looks like a duplicate of
> shared_preload_libraries but for an extra code path dedicated to the
> archiver, where we could just have a hook instead?  We have been
> talking for some time now to make the archiver process more
> bgworker-ish, so as we finish with something closer to what the
> logical replication launcher is.

Why in the world would you want a plain hook rather than something
closer to the way logical replication works?

Plain hooks are annoying to use. If you load things at the wrong time,
it silently doesn't work. It's also impossible to unload anything. If
you want to change to a different module, you probably have to bounce
the whole server instead of just changing the GUC and letting it load
the new module when you run 'pg_ctl reload'.

Blech! :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.

2021-11-02 Thread Robert Haas
On Sat, Oct 23, 2021 at 5:45 PM Jeff Davis  wrote:
> Add new predefined role pg_maintenance, which can issue VACUUM,
> ANALYZE, CHECKPOINT.

Just as a sort of general comment on this endeavor, I suspect that any
attempt to lump things together that seem closely related is doomed to
backfire. There's bound to be somebody who wants to grant some of
these permissions and not others, or who wants to grant the ability to
run those commands on some tables but not others. That's kind of
unfortunate because it makes it more complicated to implement stuff
like this ... but I've more or less given up hope on getting away with
anything else.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: enabling FOO=bar arguments to vcregress.pl

2021-11-02 Thread Andrew Dunstan

On 11/1/21 21:23, Michael Paquier wrote:
> On Mon, Nov 01, 2021 at 11:33:21AM -0400, Andrew Dunstan wrote:
>> As I mentioned recently at
>> , 
>> I want to get USE_MODULE_DB working for vcregress.pl. I started out
>> writing code to strip this from the command line or get it from the
>> environment, but then it struck me that if would be better to implement
>> a general Makefile-like mechanism for handling FOO=bar type arguments on
>> the command line, along the lines of the attached.
> I am not sure to understand how that will that work with USE_MODULE_DB
> which sets up the database names used by the regression tests.  Each
> target's module has its own needs in terms of settings that can be
> used, meaning that you would still need some boilerplate to do the
> mapping between a variable and its command argument?



I think you misunderstood the purpose of my email. It wasn't meant to
be  complete patch.


But here's an untested patch that should do almost all of what I want
for USE_MODULE_DB.


cheers


andrew


-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index fc826da3ff..840931b210 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -39,6 +39,14 @@ if (-e "src/tools/msvc/buildenv.pl")
 	do "./src/tools/msvc/buildenv.pl";
 }
 
+my %settings;
+
+while (@ARGV && $ARGV[0] =~ /([A-Za-z]\w*)=(.*)/)
+{
+	$settings{$1}={$2};
+	shift;
+}
+
 my $what = shift || "";
 if ($what =~
 	/^(check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck|taptest)$/i
@@ -411,10 +419,12 @@ sub plcheck
 		print
 		  "\n";
 		print "Checking $lang\n";
+		my $dbname =  "pl_regression";
+		$dbname = "pl_regression_$lang" if $settings{USE_MODULE_DB};
 		my @args = (
 			"$topdir/$Config/pg_regress/pg_regress",
 			"--bindir=$topdir/$Config/psql",
-			"--dbname=pl_regression", @lang_args, @tests);
+			"--dbname=$dbname", @lang_args, @tests);
 		system(@args);
 		my $status = $? >> 8;
 		exit $status if $status;
@@ -692,6 +702,12 @@ sub fetchRegressOpts
 	{
 		push @opts, "--no-locale";
 	}
+	if ($settings{USE_MODULE_DB} && $m =~ /^\s*MODULE(?:S|_big)\s*=\s*(\S+)/m)
+	{
+		my $dbname = $1;
+		$dbname =~ s/plpython.*/plpython/;
+		push @opts, "--dbname=contrib_regression_$dbname";
+	}
 	return @opts;
 }
 


Re: archive modules

2021-11-02 Thread Bossart, Nathan
On 11/1/21, 9:44 PM, "Fujii Masao"  wrote:
> What is the main motivation of this patch? I was thinking that
> it's for parallelizing WAL archiving. But as far as I read
> the patch very briefly, WAL file name is still passed to
> the archive callback function one by one.

The main motivation is provide a way to archive without shelling out.
This reduces the amount of overhead, which can improve archival rate
significantly.  It should also make it easier to archive more safely.
For example, many of the common shell commands used for archiving
won't fsync the data, but it isn't too hard to do so via C.  The
current proposal doesn't introduce any extra infrastructure for
batching or parallelism, but it is probably still possible.  I would
like to eventually add batching, but for now I'm only focused on
introducing basic archive module support.

> Are you planning to extend this mechanism to other WAL
> archiving-related commands like restore_command? I can imagine
> that those who use archive library (rather than shell) would
> like to use the same mechanism for WAL restore.

I would like to do this eventually, but my current proposal is limited
to archive_command.

> I think that it's worth adding this module into core
> rather than handling it as test module. It provides very basic
> WAL archiving feature, but (I guess) it's enough for some users.

Do you think it should go into contrib?

Nathan



[PATCH] fix references to like_regex

2021-11-02 Thread Gilles Darold

Hi,


Since we have the regexp_like operator I have found that there is two 
references in the documentation about PostgreSQL lacking of LIKE_REGEX 
implementation. Here is a patch to fix the documentation. I simply 
remove the reference to non exist of LIKE_REGEX in PostgreSQL in chapter 
"9.7.3.8 Differences from XQuery"  And try to modify chapter "9.16.2.3. 
SQL/JSON Regular Expressions" to mention the REGEXP_LIKE operator. For 
the second fix there should be better wording.



Best regards,

--
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b49dff2ff..cf3b2cc827 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -7353,26 +7353,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
 

-   Differences from XQuery (LIKE_REGEX)
-
-   
-LIKE_REGEX
-   
+   Differences from XQuery
 

 XQuery regular expressions

 
-
- Since SQL:2008, the SQL standard includes
- a LIKE_REGEX operator that performs pattern
- matching according to the XQuery regular expression
- standard.  PostgreSQL does not yet
- implement this operator, but you can get very similar behavior using
- the regexp_match() function, since XQuery
- regular expressions are quite close to the ARE syntax described above.
-
-
 
  Notable differences between the existing POSIX-based
  regular-expression feature and XQuery regular expressions include:
@@ -17456,11 +17442,11 @@ $[*] ? (@ like_regex "^[aeiou]" flag "i")
 
  The SQL/JSON standard borrows its definition for regular expressions
  from the LIKE_REGEX operator, which in turn uses the
- XQuery standard.  PostgreSQL does not currently support the
- LIKE_REGEX operator.  Therefore,
- the like_regex filter is implemented using the
- POSIX regular expression engine described in
- .  This leads to various minor
+ XQuery standard.  PostgreSQL supports the REGEX_LIKE
+ operator which is the equivalent minus some differences described in
+ . The like_regex
+ filter is implemented using the POSIX regular expression engine, see
+ . This leads to various minor
  discrepancies from standard SQL/JSON behavior, which are cataloged in
  .
  Note, however, that the flag-letter incompatibilities described there


Re: Added schema level support for publication.

2021-11-02 Thread Tomas Vondra



On 11/2/21 11:37 AM, Amit Kapila wrote:
> On Mon, Nov 1, 2021 at 5:52 PM Tomas Vondra
>  wrote:
>>
>> On 11/1/21 11:18, Amit Kapila wrote:
>>> On Mon, Nov 1, 2021 at 2:48 AM Tomas Vondra
>>>  wrote:
 I wonder if it'd be better to just separate the schema and object type
 specification, instead of mashing it into a single constant.

>>>
>>> Do you mean to say the syntax on the lines of Create Publication For
>>> Table t1, t2 Schema s1, s2;? If so, then originally the patch had the
>>> syntax on those lines but Tom pointed out that the meaning of such a
>>> syntax can change over a period of time and that can break apps [1]. I
>>> think the current syntax gives a lot of flexibility to users and we
>>> have some precedent for it as well.
>>>
>>
>> No, I'm not talking about the syntax at all - I'm talking about how we
>> represent it. PUBLICATIONOBJ_TABLE_CURRSCHEMA mixes the object type and
>> schema in the same constant, so I am wondering if we should just split
>> that into two pieces - one determining the schema, one determining the
>> object type. So PublicationObjSpec would have two fields instead of just
>> pubobjtype.
>>
>> The advantage would be we wouldn't need a whole lot of new constants for
>> each object type - adding sequences pretty much means adding
>>
>>  PUBLICATIONOBJ_SEQUENCE
>>  PUBLICATIONOBJ_SEQUENCE_IN_SCHEMA
>>  PUBLICATIONOBJ_SEQUENCE_CURRSCHEMA
>>
>> and after splitting we'd need just the first one.
>>
> 
> I see your point but OTOH, I think it will lead to additional checks
> in post-processing functions like ObjectsInPublicationToOids() as we
> have to always check both object type and schema to make decisions.
> 

True.

>> But maybe it's not
>> that bad, though. We don't expect all that many object types in
>> publications, I guess.
>>
> 
> Yeah, that is also true. So maybe at this, we can just rename the few
> types as suggested by you and we can look at it later if we anytime
> have more number of objects to add.
> 

+1

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Improve logging when using Huge Pages

2021-11-02 Thread Fujii Masao




On 2021/11/02 18:31, Shinoda, Noriyoshi (PN Japan FSIP) wrote:

Fujii-san, Sawada-san,

Thank you for your comment.


Also, with the patch, the log message is emitted also during initdb and 
starting up in single user mode:


Certainly the log output when executing the initdb command was a noise.
The attached patch reflects the comments and uses IsPostmasterEnvironment to 
suppress the output message.


Thanks for updating the patch!

+   ereport(IsPostmasterEnvironment ? LOG : NOTICE, (errmsg("Anonymous 
shared memory was allocated without huge pages.")));

This change causes the log message to be output with NOTICE level
even when IsPostmasterEnvironment is false. But do we really want
to log that NOTICE message in that case? Instead, isn't it better
to just output the log message with LOG level only when
IsPostmasterEnvironment is true?


Justin and I posted other comments upthread. Could you consider
whether it's worth applying those comments to the patch?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: refactoring basebackup.c

2021-11-02 Thread Robert Haas
On Tue, Nov 2, 2021 at 7:53 AM Jeevan Ladhe
 wrote:
> I have implemented the cleanup callback bbsink_lz4_cleanup() in the attached 
> patch.
>
> Please have a look and let me know of any comments.

Looks pretty good. I think you should work on stuff like documentation
and tests, and I need to do some work on that stuff, too. Also, I
think you should try to figure out how to support different
compression levels. For gzip, I did that by making gzip1..gzip9
possible compression settings. But that might not have been the right
idea because something like lz43 to mean lz4 at level 3 would be
confusing. Also, for the lz4 command line utility, there's not only
"lz4 -3" which means LZ4 with level 3 compression, but also "lz4
--fast=3" which selects "ultra-fast compression level 3" rather than
regular old level 3. And apparently LZ4 levels go up to 12 rather than
just 9 like gzip. I'm thinking maybe we should go with something like
"gzip@9" rather than just "gzip9" to mean gzip with compression level
9, and then things like "lz4@3" or "lz4@fast3" would select either the
regular compression levels or the ultra-fast compression levels.

Meanwhile, I think it's probably OK for me to go ahead and commit
0001-0003 from my patches at this point, since it seems we have pretty
good evidence that the abstraction basically works, and there doesn't
seem to be any value in holding off and maybe having to do a bunch
more rebasing. We may also want to look into making -Fp work with
--server-compression, which would require pg_basebackup to know how to
decompress. I'm actually not sure if this is worthwhile; you'd need to
have a network connection slow enough that it's worth spending a lot
of CPU time compressing on the server and decompressing on the client
to make up for the cost of network transfer. But some people might
have that case. It might make it easier to test this, too, since we
probably can't rely on having an LZ4 binary installed. Another thing
that you probably need to investigate is also supporting client-side
LZ4 compression. I think that is probably a really desirable addition
to your patch set, since people might find it odd if that were
exclusively a server-side option. Hopefully it's not that much work.

One minor nitpick in terms of the code:

+ mysink->bytes_written = mysink->bytes_written + headerSize;

I would use += here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: inefficient loop in StandbyReleaseLockList()

2021-11-02 Thread Tom Lane
Kyotaro Horiguchi  writes:
> At Mon, 01 Nov 2021 18:01:18 -0400, Tom Lane  wrote in 
>> So what I did in the attached is add a "canceled" flag to
>> PendingUnlinkEntry, which lets us deal with canceled or finished
>> entries without having to delete them from the list right away.
>> Then we only need to physically clean up the list once per
>> SyncPostCheckpoint call.

> We don't loop over so many canceled elements usually so I think it
> works well. However, shouldn't we consider canceled before checking
> cycle_ctr?

Good point.  I was thinking that it's best to break out of the loop
at the first opportunity.  But if the first few entries with the next
cycle_ctr value are canceled, it's best to advance over them in the
current SyncPostCheckpoint call.  It saves nothing to postpone that
work to later, and indeed adds a few cycles by leaving more data to
be copied by list_delete_first_n.  Will change it.

> I feel that we might need to wipe_mem for the memmove case as well
> (together with list_delete_nth_cell) but that is another thing even if
> that's correct.

Hm.  I made this function by copying-and-modifying list_delete_nth_cell,
so if there's something wrong there then this code inherited it.  But
I don't think it's wrong.  The wipe_mem business is only intended to
be used when enabling expensive debug options.

> Otherwise it looks good to me.

Thanks for looking!

regards, tom lane




Re: pgbench bug candidate: negative "initial connection time"

2021-11-02 Thread Fujii Masao




On 2021/11/01 23:01, Fujii Masao wrote:

The remainings are the changes of handling of initial connection or
logfile open failures. I agree to push them at least for the master.
But I'm not sure if they should be back-patched. Without these changes,
even when those failures happen, pgbench proceeds the benchmark and
reports the result. But with the changes, pgbench exits immediately in
that case. I'm not sure if there are people who expect this behavior,
but if there are, maybe we should not change it at least at stable branches.
Thought?


The current behavior should be improved, but not a bug.
So I don't think that the patch needs to be back-patched.
Barring any objection, I will push the attached patch to the master.


Pushed. Thanks!

I also pushed the typo-fix patch that I proposed upthread.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Feature request for adoptive indexes

2021-11-02 Thread Pavel Borisov
вт, 2 нояб. 2021 г. в 16:04, Hayk Manukyan :

> Tomas Vondra
> > Are you suggesting those are not the actual best/worst cases and we
> > should use some other indexes? If yes, which ones?
>
> I would say yes.
> In my case I am not querying only sequence column.
> I have the following cases which I want to optimize.
> 1. Select * from Some_table where job =  and nlp = 
> and year =  and  *scan_id =  *
> 2. Select * from Some_table where job =  and nlp = 
> and year =  and  *Issue_flag =  *
> 3. Select * from Some_table where job =  and nlp = 
> and year =  and  *sequence =  *
> Those are queries that my app send to db that is why I said that from *read
> perspective* our *best case* is 3 separate indexes for
>  *(job, nlp, year, sequence)* AND *(job, nlp, year, Scan_ID)* and *(job,
> nlp, year,  issue_flag)*  and any other solution like
>  (job, nlp, year, sequence, Scan_ID, issue_flag) OR  (job, nlp, year )
> INCLUDE(sequence, Scan_ID, issue_flag)  OR just (job, nlp, year) can be
> considered as* worst case *
> I will remind that in real world scenario I have ~50m rows and about *~5k
> rows for each (job, nlp, year )*
>

 So you get 50M rows /5K rows = 10K times selectivity, when you select on
job =  and nlp =  and year =  which is
enormous. Then you should select some of the 5K rows left, which is
expected to be pretty fast on bitmap index scan or INCLUDE column
filtering. It confirms Tomas's experiment

  pgbench -n -f q4.sql -T 60

106 ms vs 109 ms

fits your case pretty well. You get absolutely negligible difference
between best and worst case and certainly you don't need anything more than
just plain index for 3 columns, you even don't need INCLUDE index.

>From what I read I suppose that this feature indeed doesn't based on the
real need. If you suppose it is useful please feel free to make and post
here some measurements that proves your point.




-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Feature request for adoptive indexes

2021-11-02 Thread Tomas Vondra




On 11/2/21 13:04, Hayk Manukyan wrote:

Tomas Vondra
 > Are you suggesting those are not the actual best/worst cases and we
 > should use some other indexes? If yes, which ones?

I would say yes.
In my case I am not querying only sequence column.
I have the following cases which I want to optimize.
1. Select * from Some_table where job =  and nlp =  
and year =  and *scan_id =  *
2. Select * from Some_table where job =  and nlp =  
and year =  and *Issue_flag =  *
3. Select * from Some_table where job =  and nlp =  
and year =  and *sequence =  *
Those are queries that my app send to db that is why I said that from 
*read perspective* our *best case* is 3 separate indexes for
*(job, nlp, year, sequence)* AND *(job, nlp, year, Scan_ID)* and *(job, 
nlp, year,  issue_flag)*  and any other solution like
  (job, nlp, year, sequence, Scan_ID, issue_flag) OR  (job, nlp, year ) 
INCLUDE(sequence, Scan_ID, issue_flag)  OR just (job, nlp, year) can be 
considered as*worst case *


I already explained why using INCLUDE in this case is the wrong thing to 
do, it'll harm performance compared to just defining a regular index.


I will remind that in real world scenario I have ~50m rows and about 
*~5k rows for each (job, nlp, year )*


Well, maybe this is the problem. We have 50M rows, but the three columns 
have too many distinct values - (job, nlp, year) defines ~50M groups, so 
there's only a single row per group. That'd explain why the two indexes 
perform almost equally.


So I guess you need to modify the data generator so that the data set is 
more like the case you're trying to improve.


 From *write perspective* as far as we want to have only one index 
our*best case* can be considered any of
*(job, nlp, year, sequence, Scan_ID, issue_flag)* OR *(job, nlp, year ) 
INCLUDE(sequence, Scan_ID, issue_flag) *
and the*worst case* will be having 3 separate queries like in read 
perspective
(job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, 
year,  issue_flag)




Maybe. It's true a write with three indexes will require modification to 
three leaf pages (on average). With a single index we have to modify 
just one leaf page, depending on where the row gets routed.


But with the proposed "merged" index, the row will have to be inserted 
into three smaller trees. If the trees are large enough, they won't fit 
into a single leaf page (e.g. the 5000 index tuples is guaranteed to 
need many pages, even if you use some smart encoding). So the write will 
likely need to modify at least 3 leaf pages, getting much closer to 
three separate indexes. At which point you could just use three indexes.


So I think the comparison that we did is not right because we are 
comparing different/wrong things.

 > For right results we need to compare this two cases when we are doing
random queries with 1,2,3  and random writes.



I'm not going to spend any more time on tweaking the benchmark, but if 
you tweak it to demonstrate the difference / benefits I'll run it again 
on my machine etc.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Multi-Column List Partitioning

2021-11-02 Thread Nitin Jadhav
> I noticed that there's no commitfest entry for this.  Will you please
> add this to the next one?

I have added it to Nov commitfest.

Thanks & Regards,
Nitin Jadhav

On Fri, Oct 29, 2021 at 1:40 PM Amit Langote  wrote:
>
> Hi Nitin,
>
> On Fri, Oct 22, 2021 at 6:48 PM Nitin Jadhav
>  wrote:
> > Thanks for sharing. I have fixed the issue in the attached patch.
>
> I noticed that there's no commitfest entry for this.  Will you please
> add this to the next one?
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com




Re: removing global variable ThisTimeLineID

2021-11-02 Thread Robert Haas
On Mon, Nov 1, 2021 at 11:33 PM Michael Paquier  wrote:
> +   /*
> +* If we're writing and flushing WAL, the time line can't be changing,
> +* so no lock is required.
> +*/
> +   if (insertTLI)
> +   *insertTLI = XLogCtl->ThisTimeLineID;
> In 0002, there is no downside in putting that in the spinlock section
> either, no?  It seems to me that we should be able to call this one
> even in recovery, as flush LSNs exist while in recovery.

I don't think it matters very much from a performance point of view,
although putting a branch inside of a spinlock-protected section may
be undesirable. My bigger issues with this kind of idea is that we
don't want to use spinlocks as a "security blanket" (as in Linus from
Peanuts). Every time we use a spinlock, or any other kind of locking
mechanism, we should have a clear idea what we're protecting ourselves
against. I'm quite suspicious that there are a number of places where
we're taking spinlocks in xlog.c just to feel virtuous, and not
because it does anything useful, and I don't particularly want to
increase the number of such places.

Also, XLogCtl->ThisTimeLineID isn't set until the end of recovery, so
calling this function during recovery would be a mistake. There seem
to be a number of interface functions in xlog.c that should only be
called when not in recovery, and neither their names nor their
comments make that clear. I wasn't bold enough to go label all the
ones I think fall into that category, but maybe we should. Honestly,
the naming of things in this file is annoyingly bad in general. My
favorite example, found during this project, is:

#define ConvertToXSegs(x, segsize)  XLogMBVarToSegs((x), (segsize))

It's not good enough to have one name that's difficult to understand
or remember or capitalize properly -- let's have TWO -- for the same
thing!

> In 0007, XLogFileClose() should reset openLogTLI.  The same can be
> said in BootStrapXLOG() before creating the control file.  It may be
> cleaner here to introduce an InvalidTimelineId, as discussed a couple
> of days ago.

I thought about that, but I think it's pointless. I think we can just
say that if openLogFile == -1, openLogSegNo and openLogTLI are
meaningless. I don't think we're currently resetting openLogSegNo to
an invalid value either, so I don't think openLogTLI should be treated
differently. I'm not opposed to introducing InvalidTimeLineID on
principle, and I looked for a way of doing it in this patch set, but
it didn't seem all that valuable, and I feel that someone who cares
about it can do it as a separate effort after I commit these.
Honestly, I think these patches greatly reduce the need to be afraid
of leaving the TLI unset, because they remove the crutch of hoping
that ThisTimeLineID is initialized to the proper value. You actually
have to think about which TLI you are going to pass. Now I'm very sure
further improvement is possible, but I think this patch set is complex
enough already, and I'd like to get it committed before contemplating
any other work in this area.

> Some comments at the top of recoveryTargetTLIRequested need a refresh
> with their references to ThisTimelineID.

After thinking this over, I think we should just delete the entire
first paragraph, so that the comments start like this:

/*
 * recoveryTargetTimeLineGoal: what the user requested, if any
 *
 * recoveryTargetTLIRequested: numeric value of requested timeline, if constant

I don't see that we'd be losing anything that way. When you look at
that paragraph now, the first sentence is just confusing the issue,
because we actually don't care about XLogCtl->ThisTimeLineID during
recovery, because it's not set. We do care about plain old
ThisTimeLineID, but that's going to be demoted to a local variable by
these patches, and there just doesn't seem to be any reason to discuss
it here. The second sentence is wrong already, because the rmgr
routines do not rely on ThisTimeLineID. And the third sentence is also
wrong and/or confusing, because not all of the variables that follow
are TLIs -- only 3 of 5 are. I don't actually think there's any useful
introduction we need to provide here; we just need to tell people what
the variables we're about to declare do.

> The handling of XLogCtl->ThisTimeLineID still seems a bit blurry when
> it comes to timeline switches because we don't update it while in
> recovery when replaying a end-of-recovery record.  This could at least
> be documented better.

We don't update it during recovery at all. There's exactly one
assignment statement for that variable and it's here:

/* Save the selected TimeLineID in shared memory, too */
XLogCtl->ThisTimeLineID = ThisTimeLineID;
XLogCtl->PrevTimeLineID = PrevTimeLineID;

That's after the main redo loop completes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread Magnus Hagander
On Tue, Nov 2, 2021 at 9:51 AM Michael Paquier  wrote:

> On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> > On Mon, Nov 01, 2021 at 08:39:59AM +, gkokola...@pm.me wrote:
> > > Agreed.
> > >
> > > I have already started on v8 of the patch with that technique. I should
> > > be able to update the thread soon.
> >
> > Nice, thanks!
>
> By the way, I was reading the last version of the patch today, and
> it seems to me that we could make the commit history if we split the
> patch into two parts:
> - One that introduces the new option --compression-method and
> is_xlogfilename(), while reworking --compress (including documentation
> changes).
> - One to have LZ4 support.
>
> v7 has been using "gzip" for the option name, but I think that it
> would be more consistent to use "zlib" instead.
>

Um, why?

That we are using zlib to provide the compression is an implementation
detail. Whereas AFAIK "gzip" refers to both the program and the format. And
we specifically use the gzxxx() functions in zlib, in order to produce gzip
format.

I think for the end user, it is strictly better to name it "gzip", and
given that the target of this option is the end user we should do so. (It'd
be different it we were talking about a build-time parameter to configure).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


RE: Failed transaction statistics to measure the logical replication progress

2021-11-02 Thread osumi.takami...@fujitsu.com
On Monday, November 1, 2021 10:18 PM I wrote:
> On Thursday, October 28, 2021 11:19 PM I wrote:
> > I've created a new patch that extends pg_stat_subscription_workers to
> > include other transaction statistics.
> >
> > Note that this patch depends on v18 patch-set in [1]...
> Rebased based on the v19 in [1].
I'd like to add one more explanation about the latest patches for review.


On Thursday, October 28, 2021 11:19 PM I wrote:
> (2)
> Updates of stats on the view at either commit prepared or rollback prepared
> time.
> This means we don't lost prepared transaction size even after server restart
> and user can see the results of two phase operation at those timings.
There is another reason I had to treat 'prepare' message like above.
Any reviewer might think that sending prepared bytes to stats collector
and making the bytes survive over the restart is too much.

But, we don't know if the prepared transaction is processed
by commit prepared or rollback prepared at 'prepare' time.
An execution of commit prepared needs to update the column of xact_commit and
xact_commit_bytes while rollback prepared does the column of xact_abort
and xact_abort_bytes where this patch is introducing.

Therefore, even when we can calculate the bytes of prepared txn at prepare time,
it's *not* possible to add the transaction bytes to either stats column of
bytes sizes and clean up the bytes. Then, there was a premise
that we should not lose the prepared txn bytes by the shutdown
so my patch has become the implementation described above.


Best Regards,
Takamichi Osumi



Re: Feature request for adoptive indexes

2021-11-02 Thread Hayk Manukyan
Tomas Vondra
> Are you suggesting those are not the actual best/worst cases and we
> should use some other indexes? If yes, which ones?

I would say yes.
In my case I am not querying only sequence column.
I have the following cases which I want to optimize.
1. Select * from Some_table where job =  and nlp = 
and year =  and  *scan_id =  *
2. Select * from Some_table where job =  and nlp = 
and year =  and  *Issue_flag =  *
3. Select * from Some_table where job =  and nlp = 
and year =  and  *sequence =  *
Those are queries that my app send to db that is why I said that from *read
perspective* our *best case* is 3 separate indexes for
 *(job, nlp, year, sequence)* AND *(job, nlp, year, Scan_ID)* and *(job,
nlp, year,  issue_flag)*  and any other solution like
 (job, nlp, year, sequence, Scan_ID, issue_flag) OR  (job, nlp, year )
INCLUDE(sequence, Scan_ID, issue_flag)  OR just (job, nlp, year) can be
considered as* worst case *
I will remind that in real world scenario I have ~50m rows and about *~5k
rows for each (job, nlp, year )*
>From *write perspective* as far as we want to have only one index our* best
case* can be considered any of
*(job, nlp, year, sequence, Scan_ID, issue_flag)* OR * (job, nlp, year )
INCLUDE(sequence, Scan_ID, issue_flag) *
and the* worst case* will be having 3 separate queries like in read
perspective
(job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp,
year,  issue_flag)

So I think the comparison that we did is not right because we are comparing
different/wrong things.

For right results we need to compare this two cases when we are doing
random queries with 1,2,3  and random writes.


вт, 2 нояб. 2021 г. в 01:16, Tomas Vondra :

>
>
> On 11/1/21 21:06, Robert Haas wrote:
> > On Tue, Oct 26, 2021 at 11:11 AM Tomas Vondra
> >  wrote:
> >> If I get what you propose, you want to have a "top" tree for (job, nlp,
> >> year), which "splits" the data set into subsets of ~5000-7000 rows. And
> >> then for each subset you want a separate "small" trees on each of the
> >> other columns, so in this case three trees.
> >>
> >> Well, the problem with this is pretty obvious - each of the small trees
> >> requires separate copies of the leaf pages. And remember, in a btree the
> >> internal pages are usually less than 1% of the index, so this pretty
> >> much triples the size of the index. And if you insert a row into the
> >> index, it has to insert the item pointer into each of the small trees,
> >> likely requiring a separate I/O for each.
> >>
> >> So I'd bet this is not any different from just having three separate
> >> indexes - it doesn't save space, doesn't save I/O, nothing.
> >
> > I agree. In a lot of cases, it's actually useful to define the index
> > on fewer columns, like (job, nlp, year) or even just (job, nlp) or
> > even just (job) because it makes the index so much smaller and that's
> > pretty important. If you have enough duplicate entries in a (job, nlp,
> > year) index to justify create a (job, nlp, year, sequence) index, the
> > number of distinct (job, nlp, year) tuples has to be small compared to
> > the number of (job, nlp, year, sequence) tuples - and that means that
> > you wouldn't actually save much by combining your (job, nlp, year,
> > sequence) index with a (job, nlp, year, other-stuff) index. As you
> > say, the internal pages aren't the problem.
> >
> > I don't intend to say that there's no possible use for this kind of
> > technology. Peter G. says that people are writing research papers
> > about that and they probably wouldn't be doing that unless they'd
> > found some case where it's a big win. But this example seems extremely
> > unconvincing.
> >
>
> I actually looked at the use case mentioned by Peter G, i.e. merged
> indexes with master-detail clustering (see e.g. [1]), but that seems
> like a rather different thing. The master-detail refers to storing rows
> from multiple tables, interleaved in a way that allows faster joins. So
> it's essentially a denormalization tool.
>
> Perhaps there's something we could learn about efficient storage of the
> small trees, but I haven't found any papers describing that (I haven't
> spent much time on the search, though).
>
> [1] Algorithms for merged indexes, Goetz Graefe
>  https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.7709
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: refactoring basebackup.c

2021-11-02 Thread Jeevan Ladhe
I have implemented the cleanup callback bbsink_lz4_cleanup() in the
attached patch.


Please have a look and let me know of any comments.


Regards,

Jeevan Ladhe

On Fri, Oct 29, 2021 at 6:54 PM Robert Haas  wrote:

> On Fri, Oct 29, 2021 at 8:59 AM Jeevan Ladhe
>  wrote:>
> > bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
> > bbsink_cleanup() callback is triggered, it tries to invoke a function
> that
> > is NULL. I think either bbsink_gzip_ops should set the cleanup callback
> > to bbsink_forward_cleanup or we should be calling the cleanup() callback
> > from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
> > we call from PG_CATCH, it will have a similar problem for gzip and other
> > sinks which may not need a custom cleanup() callback in case there is any
> > error before the backup could finish up normally.
> >
> > I have attached a patch to fix this.
>
> Yes, this is the right fix. Apologies for the oversight.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


lz4_compress_v7.patch
Description: Binary data


Re: Teach pg_receivewal to use lz4 compression

2021-11-02 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Tuesday, November 2nd, 2021 at 9:51 AM, Michael Paquier 
 wrote:

> On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> > On Mon, Nov 01, 2021 at 08:39:59AM +, gkokola...@pm.me wrote:
> > > Agreed.
> > >
> > > I have already started on v8 of the patch with that technique. I should
> > > be able to update the thread soon.
> >
> > Nice, thanks!
>

A pleasure. Please find it in the attached v8-0002 patch.

> By the way, I was reading the last version of the patch today, and
> it seems to me that we could make the commit history if we split the
> patch into two parts:
> - One that introduces the new option --compression-method and
> is_xlogfilename(), while reworking --compress (including documentation
> changes).
> - One to have LZ4 support.

Agreed.

>
> v7 has been using "gzip" for the option name, but I think that it
> would be more consistent to use "zlib" instead.

Done.

Cheers,
//Georgios

> --
> MichaelFrom dc33f7ea2930dfc5a7bace52eb0086c759a7d86e Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Tue, 2 Nov 2021 10:39:42 +
Subject: [PATCH v8 1/2] Refactor pg_receivewal in preparation for introducing
 lz4 compression

The program pg_receivewal can use gzip compression to store the received WAL.
The option `--compress` with a value [1, 9] was used to denote that gzip
compression was required. When `--compress` with a value of `0` was used, then
no compression would take place.

This commit introduces a new option, `--compression-method`. Valid values are
[none|zlib]. The option `--compress` requires for `--compression-method` with
value other than `none`. Also `--compress=0` now returns an error.

Under the hood, there are no surprising changes. A new enum WalCompressionMethod
has been introduced and is used throughout the relevant codepaths to explicitly
note which compression method to use.

Last, the macros IsXLogFileName and friends, have been replaced by the function
is_xlogfilename(). This will allow for easier expansion of the available
compression methods that can be recognised.
---
 doc/src/sgml/ref/pg_receivewal.sgml  |  24 ++-
 src/bin/pg_basebackup/pg_basebackup.c|   7 +-
 src/bin/pg_basebackup/pg_receivewal.c| 164 +--
 src/bin/pg_basebackup/receivelog.c   |   2 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  16 +-
 src/bin/pg_basebackup/walmethods.c   |  51 --
 src/bin/pg_basebackup/walmethods.h   |  15 +-
 src/tools/pgindent/typedefs.list |   1 +
 8 files changed, 200 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 9fde2fd2ef..f95cffcd5e 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -263,15 +263,31 @@ PostgreSQL documentation
   
  
 
+ 
+  --compression-method=level
+  
+   
+Enables compression of write-ahead logs using the specified method.
+Supported values zlib,
+and none.
+   
+  
+ 
+
  
   -Z level
   --compress=level
   

-Enables gzip compression of write-ahead logs, and specifies the
-compression level (0 through 9, 0 being no compression and 9 being best
-compression).  The suffix .gz will
-automatically be added to all filenames.
+Specifies the compression level (1 through
+9, 1 being worst compression
+and 9 being best compression) for
+zlib compressed WAL segments.
+   
+
+   
+This option requires --compression-method to be
+specified with zlib.

   
  
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 27ee6394cf..cdea3711b7 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -555,10 +555,13 @@ LogStreamerMain(logstreamer_param *param)
 	stream.replication_slot = replication_slot;
 
 	if (format == 'p')
-		stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0,
+		stream.walmethod = CreateWalDirectoryMethod(param->xlog,
+	COMPRESSION_NONE, 0,
 	stream.do_sync);
 	else
-		stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel,
+		stream.walmethod = CreateWalTarMethod(param->xlog,
+			  COMPRESSION_NONE, /* ignored */
+			  compresslevel,
 			  stream.do_sync);
 
 	if (!ReceiveXlogStream(param->bgconn, ))
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 04ba20b197..9641f4a2f4 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -32,6 +32,9 @@
 /* Time to sleep between reconnection attempts */
 #define RECONNECT_SLEEP_TIME 5
 
+/* This is just the redefinition of a libz constant */
+#define Z_DEFAULT_COMPRESSION (-1)
+
 /* Global options */
 static char *basedir = NULL;
 

Re: RFC: Logging plan of the running query

2021-11-02 Thread Ekaterina Sokolova

Hi!

I'm here to answer your questions about contrib/pg_query_state.

I only took a quick look at pg_query_state, I have some questions.



pg_query_state seems using shm_mq to expose the plan information, but
there was a discussion that this kind of architecture would be tricky
to do properly [1].
Does pg_query_state handle difficulties listed on the discussion?
[1] 
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com


I doubt that it was the right link.
But on the topic I will say that extension really use shared memory, 
interaction is implemented by sending / receiving messages. This 
architecture provides the required reliability and convenience.



It seems the caller of the pg_query_state() has to wait until the
target process pushes the plan information into shared memory, can it
lead to deadlock situations?
I came up with this question because when trying to make a view for
memory contexts of other backends, we encountered deadlock situations.
After all, we gave up view design and adopted sending signal and
logging.


Discussion at the following URL.
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com


Before extracting information about side process we check its state. 
Information will only be retrieved for a process willing to provide it. 
Otherwise, we will receive an error message about impossibility of 
getting query execution statistics + process status. Also checking fact 
of extracting your own status exists. This is even verified in tests.


Thanks for your attention.
Just in case, I am ready to discuss this topic in more detail.

About overhead:

I haven't measured it yet, but I believe that the overhead for backends
which are not called pg_log_current_plan() would be slight since the
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might
not slight, but since the target process are assumed dealing with
long-running query and the user want to know its plan, its overhead
would be worth the cost.
I think it would be useful for us to have couple of examples with a 
different number of rows compared to using without this functionality.


Hope this helps.

--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Should we support new definition for Identity column : GENERATED BY DEFAULT ON NULL?

2021-11-02 Thread Himanshu Upadhyaya
Hi,

Trying to insert NULL value to the Identity column defined by "GENERATED BY
DEFAULT" is disallowed, but there can be use cases where the user would
like to have an identity column where manual NULL insertion is required(and
it should not error-out by Postgres).

How about having a new type for the Identity column as "GENERATED BY
DEFAULT ON NULL", which will allow manual NULL insertion and internally
NULL value will be replaced by Sequence NextValue?

ORACLE is supporting this feature by having a similar Identity column type
as below:
===
SQL> CREATE TABLE itest1 (id1 INTEGER GENERATED BY DEFAULT ON NULL
AS IDENTITY, id2 INTEGER);

Table created.

SQL> INSERT INTO itest1 VALUES (NULL, 10);--Supported with GENERATED BY
DEFAULT ON NULL

1 row created.

SQL> INSERT INTO itest1 VALUES (1,30);

1 row created.

SQL> INSERT INTO itest1 (id2) VALUES (20);

1 row created.

SQL> SELECT * FROM itest1;

   ID1ID2
-- --
 1 10
 1 30
 2 20


I think it is good to have support for GENERATED BY DEFAULT ON NULL in
Postgres.

Thoughts?

Thanks,
Himanshu


Re: Error "initial slot snapshot too large" in create replication slot

2021-11-02 Thread Dilip Kumar
On Tue, Oct 19, 2021 at 2:25 PM Dilip Kumar  wrote:
>
> On Tue, Oct 12, 2021 at 11:30 AM Dilip Kumar  wrote:
> >
> > On Tue, Oct 12, 2021 at 10:35 AM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Tue, 12 Oct 2021 13:59:59 +0900 (JST), Kyotaro Horiguchi 
> > >  wrote in
> > > > So I came up with the attached version.
> > >
> > > Sorry, it was losing a piece of change.
> >
> > Yeah, at a high level this is on the idea I have in mind, I will do a
> > detailed review in a day or two.  Thanks for working on this.
>
> While doing the detailed review, I think there are a couple of
> problems with the patch,  the main problem of storing all the xid in
> the snap->subxip is that once we mark the snapshot overflown then the
> XidInMVCCSnapshot, will not search the subxip array, instead it will
> fetch the topXid and search in the snap->xip array.

I missed that you are marking the snapshot as takenDuringRecovery so
your fix looks fine.

Another issue is
> that the total xids could be GetMaxSnapshotSubxidCount()
> +GetMaxSnapshotXidCount().
>

I have fixed this, the updated patch is attached.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From b6ca7a05b05a9244bf400d14b75dfcd1f99c76cd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Oct 2021 13:53:27 +0900
Subject: [PATCH v3] Allow overflowed snapshot when creating logical
 replication slot

Snapshot can hold top XIDs up to the number of server processes but
SnapBuildInitialSnapshot tries to store all top-level and sub XIDs to
there and fails for certain circumstances. Instead, create a
"takenDuringRecovery" snapshot instead, which stores all XIDs in
subxip array.  Addition to that, allow to create an overflowed
snapshot by adding to reorder buffer a function to tell whether an XID
is a top-level or not.
---
 src/backend/replication/logical/reorderbuffer.c | 20 +
 src/backend/replication/logical/snapbuild.c | 57 +++--
 src/include/replication/reorderbuffer.h |  1 +
 3 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e6660..4e452cc 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3327,6 +3327,26 @@ ReorderBufferXidHasBaseSnapshot(ReorderBuffer *rb, TransactionId xid)
 
 
 /*
+ * ReorderBufferXidIsKnownSubXact
+ *		Returns true if the xid is a known subtransaction.
+ */
+bool
+ReorderBufferXidIsKnownSubXact(ReorderBuffer *rb, TransactionId xid)
+{
+	ReorderBufferTXN *txn;
+
+	txn = ReorderBufferTXNByXid(rb, xid, false,
+NULL, InvalidXLogRecPtr, false);
+
+	/* a known subtxn? */
+	if (txn && rbtxn_is_known_subxact(txn))
+		return true;
+
+	return false;
+}
+
+
+/*
  * ---
  * Disk serialization support
  * ---
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index dbdc172..9fd28d4 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -531,8 +531,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
 	TransactionId xid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
+	TransactionId *newsubxip;
+	int			maxxidcnt;
+	bool		overflowed = false;
 
 	Assert(!FirstSnapshotSet);
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -568,9 +569,13 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 	MyProc->xmin = snap->xmin;
 
-	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
+	/*
+	 * Allocate in transaction context. We use only subxip to store xids. See
+	 * GetSnapshotData for details.
+	 */
+	newsubxip = (TransactionId *) palloc(sizeof(TransactionId) *
+		 (GetMaxSnapshotXidCount() +
+		  GetMaxSnapshotSubxidCount()));
 
 	/*
 	 * snapbuild.c builds transactions in an "inverted" manner, which means it
@@ -578,6 +583,9 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * classical snapshot by marking all non-committed transactions as
 	 * in-progress. This can be expensive.
 	 */
+retry:
+	newsubxcnt = 0;
+
 	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
 	{
 		void	   *test;
@@ -591,12 +599,29 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 		if (test == NULL)
 		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-ereport(ERROR,
-		(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-		 errmsg("initial slot snapshot too large")));
+			bool add = true;
 
-			newxip[newxcnt++] = xid;
+			/* exlude subxids when subxip is overflowed */
+			if (overflowed &&
+ReorderBufferXidIsKnownSubXact(builder->reorder, xid))
+add = false;
+
+			if (add)
+			{
+if (newsubxcnt >= GetMaxSnapshotSubxidCount())
+{
+	if (overflowed)
+		ereport(ERROR,
+

Re: should we enable log_checkpoints out of the box?

2021-11-02 Thread Daniel Gustafsson
> On 1 Nov 2021, at 14:02, Magnus Hagander  wrote:

> There is still some data that log_checkpoint gives you that the statistics 
> don't -- but maybe we should instead look at exposing that information in 
> pg_stat_bgwriter, rather than changing the default.

I don't have strong opinions on changing the default, but +1 on at
investigating exposing the information in the view for monitoring.

--
Daniel Gustafsson   https://vmware.com/





  1   2   >