date:20220107

Re: \dP and \dX use ::regclass without "pg_catalog."

2022-01-07 Thread Michael Paquier

On Fri, Jan 07, 2022 at 03:56:20PM -0600, Justin Pryzby wrote:
> I'd propose these.

Applied and backpatched down to 14.  One of the aliases is present in
13~, but I have let that alone.  The detection regex posted upthread
is kind of cool.  
--
Michael


signature.asc
Description: PGP signature

Re: Non-superuser subscription owners

2022-01-07 Thread Jeff Davis

On Wed, 2021-12-15 at 12:23 -0800, Mark Dilger wrote:
> > On Nov 24, 2021, at 4:30 PM, Jeff Davis  wrote:
> > 
> > We need to do permission checking for WITH CHECK OPTION and RLS.
> > The
> > patch right now allows the subscription to write data that an RLS
> > policy forbids.
> 
> Version 4 of the patch, attached, no longer allows RLS to be
> circumvented, but does so in a course-grained fashion.

Committed.

I tried to do some performance testing to see if there was any impact
of the extra catalog + ACL checks. Logical replication seems slow
enough -- something like 3X slower than local inserts -- that it didn't
seem to make a difference.

To test it, I did the following:
  1. sent a SIGSTOP to the logical apply worker
  2. loaded more data in publisher
  3. made the subscriber a sync replica
  4. timed the following:
a. sent a SIGCONT to the logical apply worker
b. insert a single tuple on the publisher side
c. wait for the insert to return, indicating that logical
   replication is done up to that point

Does anyone have a better way to measure logical replication
performance?

Regards,
Jeff Davis

Re: Non-superuser subscription owners

2022-01-07 Thread Jeff Davis

On Sat, 2022-01-08 at 12:27 +0530, Amit Kapila wrote:
> For Update/Delete, we do read the table first via
> FindReplTupleInLocalRel(), so is there a need to check ACL_SELECT
> before that?

If it's logically an update/delete, then I think ACL_UPDATE/DELETE is
the right one to check. Do you have a different opinion?

Regards,
Jeff Davis

Re: Non-superuser subscription owners

2022-01-07 Thread Amit Kapila

On Thu, Dec 16, 2021 at 1:53 AM Mark Dilger
 wrote:
>
> > On Nov 24, 2021, at 4:30 PM, Jeff Davis  wrote:
> >
> > We need to do permission checking for WITH CHECK OPTION and RLS. The
> > patch right now allows the subscription to write data that an RLS
> > policy forbids.
>
> Version 4 of the patch, attached.
>

For Update/Delete, we do read the table first via
FindReplTupleInLocalRel(), so is there a need to check ACL_SELECT
before that?

-- 
With Regards,
Amit Kapila.

Re: Updating Copyright notices to 2022?

2022-01-07 Thread Michael Paquier

On Fri, Jan 07, 2022 at 07:06:06PM -0500, Bruce Momjian wrote:
> Done, sorry for the delay.

No problem.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: row filtering for logical replication

2022-01-07 Thread Amit Kapila

On Fri, Jan 7, 2022 at 11:20 PM Euler Taveira  wrote:
>
> On Fri, Jan 7, 2022, at 6:05 AM, Amit Kapila wrote:
>
> Euler, I have one more question about this patch for you. I see that
> in the patch we are calling coerce_to_target_type() in
> pgoutput_row_filter_init_expr() but do we really need the same? We
> already do that via
> transformPubWhereClauses->transformWhereClause->coerce_to_boolean
> before storing where clause expression. It is not clear to me why that
> is required? We might want to add a comment if that is required.
>
> It is redundant. It seems an additional safeguard that we should be removed.
> Good catch.
>

Thanks for the confirmation. Actually, it was raised by Vignesh in his
email [1].

[1] - 
https://www.postgresql.org/message-id/CALDaNm1_JVg_hqoGex_FVca_HPF46n9oDDB9dsp1SrPuaVpp-w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: skip replication slot snapshot/map file removal during end-of-recovery checkpoint

2022-01-07 Thread Bharath Rupireddy

On Thu, Jan 6, 2022 at 5:04 AM Bossart, Nathan  wrote:
>
> On 12/23/21, 3:17 AM, "Bharath Rupireddy" 
>  wrote:
> > Currently the end-of-recovery checkpoint can be much slower, impacting
> > the server availability, if there are many replication slot files
> > .snap or map- to be enumerated and deleted. How about skipping
> > the .snap and map- file handling during the end-of-recovery
> > checkpoint? It makes the server available faster and the next regular
> > checkpoint can deal with these files. If required, we can have a GUC
> > (skip_replication_slot_file_handling or some other better name) to
> > control this default being the existing behavior.
>
> I suggested something similar as a possibility in the other thread
> where these tasks are being discussed [0].  I think it is worth
> considering, but IMO it is not a complete solution to the problem.  If
> there are frequently many such files to delete and regular checkpoints
> are taking longer, the shutdown/end-of-recovery checkpoint could still
> take a while.  I think it would be better to separate these tasks from
> checkpointing instead.
>
> [0] https://postgr.es/m/A285A823-0AF2-4376-838E-847FA4710F9A%40amazon.com

Thanks. I agree to solve it as part of the other thread and close this
thread here.

Regards,
Bharath Rupireddy.

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Jeff Davis

On Fri, 2022-01-07 at 14:54 -0800, Andres Freund wrote:
> > If you only promote the furthest-ahead sync replica (which is what
> > you
> > should be doing if you have quorum commit), why wouldn't that work?
> 
> Remove "sync" from the above sentence, and the sentence holds true
> for
> combinations of sync/async replicas as well.

Technically that's true, but it seems like a bit of a strange use case.
I would think people doing that would just include those async replicas
in the sync quorum instead.

The main case I can think of for a mix of sync and async replicas are
if they are just managed differently. For instance, the sync replica
quorum is managed for a core part of the system, strategically
allocated on good hardware in different locations to minimize the
chance of dependent failures; while the async read replicas are
optional for taking load off the primary, and may appear/disappear in
whatever location and on whatever hardware is most convenient.

But if an async replica can get ahead of the sync rep quorum, then the
most recent transactions can appear in query results, so that means the
WAL shouldn't be lost, and the async read replicas become a part of the
durability model.

If the async read replica can't be promoted because it's not suitable
(due to location, hardware, whatever), then you need to frantically
copy the final WAL records out to an instance in the sync rep quorum.
That requires extra ceremony for every failover, and might be dubious
depending on how safe the WAL on your async read replicas is, and
whether there are dependent failure risks.

Yeah, I guess there could be some use case woven amongst those caveats,
but I'm not sure if anyone is actually doing that combination of things
safely today. If someone is, it would be interesting to know more about
that use case.

The proposal in this thread is quite a bit simpler: manage your sync
quorum and your async read replicas separately, and keep the sync rep
quorum ahead.

> > > To me this just sounds like trying to shoehorn something into
> > > syncrep
> > > that
> > > it's not made for.
> > 
> > What *is* sync rep made for?

This was a sincere question and an answer would be helpful. I think
many of the discussions about sync rep get derailed because people have
different ideas about when and how it should be used, and the
documentation is pretty light.

> This is a especially relevant in cases where synchronous_commit=on vs
> local is
> used selectively

That's an interesting point.

However, it's hard for me to reason about "kinda durable" and "a little
more durable" and I'm not sure how many people would care about that
distinction.

> I don't see that. This presumes that WAL replicated to async replicas
> is
> somehow bad.

Simple case: primary and async read replica are in the same server
rack. Sync replicas are geographically distributed with quorum commit.
Read replica gets the WAL first (because it's closest), starts
answering queries that include that WAL, and then the entire rack
catches fire. Now you've returned results to the client, but lost the
transactions.

Regards,
Jeff Davis

Re: [PoC] Delegating pg_ident to a third party

2022-01-07 Thread Jacob Champion

On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:
> On Tue, Jan 4, 2022 at 18:56 Jacob Champion  wrote:
> > 
> > Could you talk more about the use cases for which having the "actual
> > user" is better? From an auditing perspective I don't see why
> > "authenticated as ja...@example.net, logged in as admin" is any worse
> > than "logged in as jacob".
> 
> The above case isn’t what we are talking about, as far as I
> understand anyway. You’re suggesting “authenticated as 
> ja...@example.net, logged in as sales” where the user in the database
> is “sales”.  Consider triggers which only have access to “sales”, or
> a tool like pgaudit which only has access to “sales”.

Okay. So an additional getter function in miscadmin.h, and surfacing
that function to trigger languages, are needed to make authn_id more
generally useful. Any other cases you can think of?

> > I was responding more to your statement that "Being able to have roles
> > and memberships automatically created is much more the direction that
> > I'd say we should be going in". It's not that one-role-per-user is
> > inherently wasteful, but forcing role proliferation where it's not
> > needed is. If all users have the same set of permissions, there doesn't
> > need to be more than one role. But see below.
> 
> Just saying it’s wasteful isn’t actually saying what is wasteful about it.

Well, I felt like it was irrelevant; you've already said you have no
intention to force one-user-per-role.

But to elaborate: *forcing* one-user-per-role is wasteful, because if I
have a thousand employees, and I want to give all my employees access
to a guest role in the database, then I have to administer a thousand
roles: maintaining them through dump/restores and pg_upgrades, auditing
them to figure out why Bob in Accounting somehow got a different
privilege GRANT than the rest of the users, adding new accounts,
purging old ones, maintaining the inevitable scripts that will result.

If none of the users need to be "special" in any way, that's all wasted
overhead. (If they do actually need to be special, then at least some
of that overhead becomes necessary. Otherwise it's waste.) You may be
able to mitigate the cost of the waste, or absorb the mitigations into
Postgres so that the user can't see the waste, or decide that the waste
is not costly enough to care about. It's still waste.

> > > I'm also not suggesting that we make everyone do the same
> > > thing, indeed, later on I was supportive of having an external system
> > > provide the mapping.  Here, I'm just making the point that we should
> > > also be looking at automatic role/membership creation.
> > 
> > Gotcha. Agreed; that would open up the ability to administer role
> > privileges externally too, which would be cool. That could be used in
> > tandem with something like this patchset.
> 
> Not sure exactly what you’re referring to here by “administer role
> privileges externally too”..?  Curious to hear what you are imagining
> specifically.

Just that it would be nice to centrally provision role GRANTs as well
as role membership, that's all. No specifics in mind, and I'm not even
sure if LDAP would be a helpful place to put that sort of config.

> I’d also point out though that having to do an ldap lookup on every
> login to PG is *already* an issue in some environments, having to do
> multiple amplifies that.

You can't use the LDAP auth method with this patch yet, so this concern
is based on code that doesn't exist. It's entirely possible that you
could do the role query as part of the first bound connection. If that
proves unworkable, then yes, I agree that it's a concern.

> Not to mention that when the ldap servers can’t be reached for some
> reason, no one can log into the database and that’s rather
> unfortunate too.

Assuming you have no caches, then yes. That might be a pretty good
argument for allowing ldapmap and map to be used together, actually, so
that you can have some critical users who can always log in as
"themselves" or "admin" or etc. Or maybe it's an argument for allowing
HBA to handle fallback methods of authentication.

Luckily I think it's pretty easy to communicate to LDAP users that if
*all* your login infrastructure goes down, you will no longer be able
to log in. They're probably used to that idea, if they haven't set up
any availability infra.

>  These are, of course, arguments for moving away from methods that
> require checking with some other system synchronously during login-
> which is another reason why it’s better to have the authentication
> credentials easily map to the PG role, without the need for external
> checks at login time.  That’s done with today’s pg_ident, but this
> patch would change that.

There are arguments for moving towards synchronous checks as well.
Central revocation of credentials (in timeframes shorter than ticket
expiration) is what comes to mind. Revocation is hard and usually
conflicts with the desire for availability.

What's

null iv parameter passed to combo_init()

2022-01-07 Thread Zhihong Yu

Hi,
In contrib/pgcrypto/pgcrypto.c :

err = px_combo_init(c, (uint8 *) VARDATA_ANY(key), klen, NULL, 0);

Note: NULL is passed as iv.

When combo_init() is called,

if (ivlen > ivs)
memcpy(ivbuf, iv, ivs);
else
memcpy(ivbuf, iv, ivlen);

It seems we need to consider the case of null being passed as iv for
memcpy() because of this:

/usr/include/string.h:44:28: note: nonnull attribute specified here

What do you think of the following patch ?

Cheers


memcpy-null-iv.patch
Description: Binary data

Re: pgsql: Refactor tar method of walmethods.c to rely on the compression m

2022-01-07 Thread Michael Paquier

On Fri, Jan 07, 2022 at 03:41:16PM +0100, Christoph Berg wrote:
> since about this commit, pg_wal.tar is no longer compressed at all:

Thanks.  That's a thinko coming from the fact that
Z_DEFAULT_COMPRESSION is -1, combination possible when specifying only
--gzip.  So, fixed.
--
Michael


signature.asc
Description: PGP signature

Re: Updating Copyright notices to 2022?

2022-01-07 Thread Bruce Momjian

On Sat, Jan  8, 2022 at 08:51:08AM +0900, Michael Paquier wrote:
> On Fri, Jan 07, 2022 at 04:57:11PM -0500, Bruce Momjian wrote:
> > I like to be current on my email before I do it, and I have been
> > completing a Debian upgrade this week so am behind.  If you would like
> > to do it, please go ahead.  If not, I will try to do it before Sunday.
> 
> Thanks for the update.  There is no rush here, so waiting for a couple
> of days won't change anything.  And historically, you are the one who
> has always done the change :)

Done, sorry for the delay.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Updating Copyright notices to 2022?

2022-01-07 Thread Michael Paquier

On Fri, Jan 07, 2022 at 04:57:11PM -0500, Bruce Momjian wrote:
> I like to be current on my email before I do it, and I have been
> completing a Debian upgrade this week so am behind.  If you would like
> to do it, please go ahead.  If not, I will try to do it before Sunday.

Thanks for the update.  There is no rush here, so waiting for a couple
of days won't change anything.  And historically, you are the one who
has always done the change :)
--
Michael


signature.asc
Description: PGP signature

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Corey Huinker

>
>  I'd be
> curious to know where we found the bits for that -- the tuple header
> isn't exactly replete with extra bit space.
>

+1 - and can we somehow shoehorn in a version # into the new format so we
never have to look for spare bits again.

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Andres Freund

Hi,

On 2022-01-07 14:36:46 -0800, Jeff Davis wrote:
> On Fri, 2022-01-07 at 12:22 -0800, Andres Freund wrote:
> > I don't see how it can *not* cause a major performance / latency
> > gotcha. You're deliberately delaying replication after all?
> 
> Are there use cases where someone wants sync rep, and also wants their
> read replicas to get ahead of the sync rep quorum?

Yes. Not in the sense of being ahead of the sync replicas, but in the sense of
being as cought up as possible, and to keep the lost WAL in case of crashes as
low as possible.

> > another sync replica would still not be guaranteed to be able to
> > follow the
> > newly promoted primary.
> 
> If you only promote the furthest-ahead sync replica (which is what you
> should be doing if you have quorum commit), why wouldn't that work?

Remove "sync" from the above sentence, and the sentence holds true for
combinations of sync/async replicas as well.

> > To me this just sounds like trying to shoehorn something into syncrep
> > that
> > it's not made for.
> 
> What *is* sync rep made for?
> 
> The only justification in the docs is around durability:
> 
> "[sync rep] extends that standard level of durability offered by a
> transaction commit... [sync rep] can provide a much higher level of
> durability..."

What is being proposed here doesn't increase durability. It *reduces* it -
it's less likely that WAL is replicated before a crash.

This is a especially relevant in cases where synchronous_commit=on vs local is
used selectively - after this change the durability of local changes is very
substantially *reduced* because they have to wait for the sync replicas before
also replicated to async replicas, but the COMMIT doesn't wait for
replication. So this "feature" just reduces the durability of such commits.

The performance overhead of syncrep is high enough that plenty real-world
usages cannot afford to use it for all transactions. And that's normally fine
from a business logic POV - often the majority of changes aren't that
important. It's non-trivial from an application implementation POV though, but
that's imo a separate concern.

> If we take that at face value, then it seems logical to say that async
> read replicas should not get ahead of sync replicas.

I don't see that. This presumes that WAL replicated to async replicas is
somehow bad. But pg_rewind exist, async replicas can be promoted and WAL from
the async replicas can be transferred to the synchronous replicas if only
those should be promoted.

Greetings,

Andres Freund

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 5, 2022 at 9:53 PM Alexander Korotkov  
> wrote:
> > 5) 32-bit limitation within the page mentioned upthread by Stephen
> > Frost should be also carefully considered.  Ideally, we should get rid
> > of it, but I don't have particular ideas in this field for now.  At
> > least, we should make sure we did our best at error reporting and
> > monitoring capabilities.
> 
> I don't think I understand the thinking here. As long as we retain the
> existing limit that the oldest running XID can't be more than 2
> billion XIDs in the past, we can't ever need to throw an error. A new
> page modification that finds very old XIDs on the page can always
> escape trouble by pruning the page and freezing whatever old XIDs
> survive pruning.

So we'll just fail such an old transaction?  Or force a server restart?
or..?  What if we try to signal that transaction and it doesn't go away?

> I would argue that it's smarter not to change the in-memory
> representation of XIDs to 64-bit in places like the ProcArray. As you
> mention in (4), that might hurt performance. But also, the benefit is
> minimal. Nobody is really sad that they can't keep transactions open
> forever. They are sad because the system has severe bloat and/or shuts
> down entirely. Some kind of change along these lines can fix the
> second of those problems, and that's progress.

I brought up the concern that I did because I would be a bit sad if I
couldn't have a transaction open for a day on a very high rate system of
the type being discussed here.  Would be fantastic if we had a solution
to that issue, but I get that reducing the need to vacuum and such would
be a really nice improvement even if we can't make long running
transactions work.  Then again, if we do actually change the in-memory
bits- then maybe we could have such a long running transaction, provided
it didn't try to make an update to a page with really old xids on it,
which might be entirely reasonable in a lot of cases.  I do still worry
about how we explain what the limitation here is and how to avoid
hitting it.  Definitely seems like a 'gotcha' that people are going to
complain about, though hopefully not as much of one as the current cases
we hear about of vacuum falling behind and the system running out of
xids.

> > I think the realistic goal for PG 15 development cycle would be
> > agreement on a roadmap for all the items above (and probably some
> > initial implementations).
> 
> +1. Trying to rush something through to commit is just going to result
> in a bunch of bugs. We need to work through the issues carefully and
> take the time to do it well.

+1.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Jeff Davis

On Fri, 2022-01-07 at 12:22 -0800, Andres Freund wrote:
> I don't see how it can *not* cause a major performance / latency
> gotcha. You're deliberately delaying replication after all?

Are there use cases where someone wants sync rep, and also wants their
read replicas to get ahead of the sync rep quorum?

If the use case doesn't exist, it doesn't make sense to talk about how
well it performs.

> another sync replica would still not be guaranteed to be able to
> follow the
> newly promoted primary.

If you only promote the furthest-ahead sync replica (which is what you
should be doing if you have quorum commit), why wouldn't that work?

> To me this just sounds like trying to shoehorn something into syncrep
> that
> it's not made for.

What *is* sync rep made for?

The only justification in the docs is around durability:

"[sync rep] extends that standard level of durability offered by a
transaction commit... [sync rep] can provide a much higher level of
durability..."

If we take that at face value, then it seems logical to say that async
read replicas should not get ahead of sync replicas.

Regards,
Jeff Davis

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-07 Thread Peter Geoghegan

On Fri, Jan 7, 2022 at 12:24 PM Robert Haas  wrote:
> This seems like a weak argument. Sure, you COULD hard-code the limit
> to be autovacuum_freeze_max_age/2 rather than making it a separate
> tunable, but I don't think it's better. I am generally very skeptical
> about the idea of using the same GUC value for multiple purposes,
> because it often turns out that the optimal value for one purpose is
> different than the optimal value for some other purpose.

I thought I was being conservative by suggesting
autovacuum_freeze_max_age/2. My first thought was to teach VACUUM to
make its FreezeLimit "OldestXmin - autovacuum_freeze_max_age". To me
these two concepts really *are* the same thing: vacrel->FreezeLimit
becomes a backstop, just as anti-wraparound autovacuum (the
autovacuum_freeze_max_age cutoff) becomes a backstop.

Of course, an anti-wraparound VACUUM will do early freezing in the
same way as any other VACUUM will (with the patch series). So even
when the FreezeLimit backstop XID cutoff actually affects the behavior
of a given VACUUM operation, it may well not be the reason why most
individual tuples that we freeze get frozen. That is, most individual
heap pages will probably have tuples frozen for some other reason.
Though it depends on workload characteristics, most individual heap
pages will typically be frozen as a group, even here. This is a
logical consequence of the fact that tuple freezing and advancing
relfrozenxid are now only loosely coupled -- it's about as loose as
the current relfrozenxid invariant will allow.

> I feel generally that a lot of the argument you're making here
> supposes that tables are going to get vacuumed regularly.

> I agree that
> IF tables are being vacuumed on a regular basis, and if as part of
> that we always push relfrozenxid forward as far as we can, we will
> rarely have a situation where aggressive strategies to avoid
> wraparound are required.

It's all relative. We hope that (with the patch) cases that only ever
get anti-wraparound VACUUMs are limited to tables where nothing else
drives VACUUM, for sensible reasons related to workload
characteristics (like the pgbench_accounts example upthread). It's
inevitable that some users will misconfigure the system, though -- no
question about that.

I don't see why users that misconfigure the system in this way should
be any worse off than they would be today. They probably won't do
substantially less freezing (usually somewhat more), and will advance
pg_class.relfrozenxid in exactly the same way as today (usually a bit
better, actually). What have I missed?

Admittedly the design of the "Freeze tuples early to advance
relfrozenxid" patch (i.e. v5-0005-*patch) is still unsettled; I need
to verify that my claims about it are really robust. But as far as I
know they are. Reviewers should certainly look at that with a critical
eye.

> Now, I agree with you in part: I don't think it's obvious that it's
> useful to tune vacuum_freeze_table_age.

That's definitely the easier argument to make. After all,
vacuum_freeze_table_age will do nothing unless VACUUM runs before the
anti-wraparound threshold (autovacuum_freeze_max_age) is reached. The
patch series should be strictly better than that. Primarily because
it's "continuous", and so isn't limited to cases where the table age
falls within the "vacuum_freeze_table_age - autovacuum_freeze_max_age"
goldilocks age range.

> We should be VERY conservative about removing
> existing settings if there's any chance that somebody could use them
> to tune their way out of trouble.

I agree, I suppose, but right now I honestly can't think of a reason
why they would be useful.

If I am wrong about this then I'm probably also wrong about some basic
facet of the high-level design, in which case I should change course
altogether. In other words, removing the GUCs is not an incidental
thing. It's possible that I would never have pursued this project if I
didn't first notice how wrong-headed the GUCs are.

> So, let's see: if we see a page where the tuples are all-visible and
> we seize the opportunity to freeze it, we can spare ourselves the need
> to ever visit that page again (unless it gets modified). But if we
> only mark it all-visible and leave the freezing for later, the next
> aggressive vacuum will have to scan and dirty the page. I'm prepared
> to believe that it's worth the cost of freezing the page in that
> scenario.

That's certainly the most compelling reason to perform early freezing.
It's not completely free of downsides, but it's pretty close.

> There's another situation in which vacuum_freeze_min_age could apply,
> though: suppose the page isn't all-visible yet. I'd argue that in that
> case we don't want to run around freezing stuff unless it's quite old
> - like older than vacuum_freeze_table_age, say. Because we know we're
> going to have to revisit this page in the next vacuum anyway, and
> expending effort to freeze tuples that may be about to be modified
> again

Re: [PATCH] allow src/tools/msvc/*.bat files to be called from the root of the source tree

2022-01-07 Thread Andrew Dunstan



On 1/4/22 08:37, Andrew Dunstan wrote:
> On 1/4/22 07:20, Michael Paquier wrote:
>> On Wed, Dec 29, 2021 at 09:48:14AM -0500, Andrew Dunstan wrote:
>>> Seems reasonable. I don't see any reason not to do it for pgbison.bat
>>> and pgflex.bat, just for the sake of consistency.
>> Yeah, that would close the loop.  Andrew, are you planning to check
>> and apply this patch?
>
>
> Sure, I can do that.
>
>

done


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Updating Copyright notices to 2022?

2022-01-07 Thread Bruce Momjian

On Thu, Jan  6, 2022 at 02:48:05PM +0900, Michael Paquier wrote:
> Hi all,
> 
> $subject is taken care of by Bruce every year, but this has not been
> done yet.  That's one run of src/tools/copyright.pl on HEAD (as of the
> attached) combined with updates of ./doc/src/sgml/legal.sgml and
> ./COPYRIGHT in the back-branches, so that's straight-forward.
> 
> Bruce, are you planning to refresh the branches?

I like to be current on my email before I do it, and I have been
completing a Debian upgrade this week so am behind.  If you would like
to do it, please go ahead.  If not, I will try to do it before Sunday.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: \dP and \dX use ::regclass without "pg_catalog."

2022-01-07 Thread Justin Pryzby

On Fri, Jan 07, 2022 at 08:08:57PM +0900, Michael Paquier wrote:
> On Fri, Jan 07, 2022 at 06:30:30PM +0900, Tatsuro Yamada wrote:
> > We should prefix them with pg_catalog as well.
> > Are you planning to make a patch?
> > If not, I'll make a patch later since that's where \dX is.
> 
> If any of you can make a patch, that would be great.  Thanks!

I'd propose these.
>From d1e337daf81faa9a0a8c33a26091cbae396df7bb Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 6 Jan 2022 20:23:52 -0600
Subject: [PATCH 1/3] psql: qualify ::text with pg_catalog

backpatch to v14
---
 src/bin/psql/describe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 7cbf5f7b105..b3fbff1a0c3 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4392,7 +4392,7 @@ listExtendedStats(const char *pattern)
 	initPQExpBuffer();
 	printfPQExpBuffer(,
 	  "SELECT \n"
-	  "es.stxnamespace::pg_catalog.regnamespace::text AS \"%s\", \n"
+	  "es.stxnamespace::pg_catalog.regnamespace::pg_catalog.text AS \"%s\", \n"
 	  "es.stxname AS \"%s\", \n",
 	  gettext_noop("Schema"),
 	  gettext_noop("Name"));
@@ -4439,7 +4439,7 @@ listExtendedStats(const char *pattern)
 
 	processSQLNamePattern(pset.db, , pattern,
 		  false, false,
-		  "es.stxnamespace::pg_catalog.regnamespace::text", "es.stxname",
+		  "es.stxnamespace::pg_catalog.regnamespace::pg_catalog.text", "es.stxname",
 		  NULL, "pg_catalog.pg_statistics_obj_is_visible(es.oid)");
 
 	appendPQExpBufferStr(, "ORDER BY 1, 2;");
-- 
2.17.1

>From 5033095abb3631bf01e4213752d5a4d3a24896fa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 6 Jan 2022 20:25:07 -0600
Subject: [PATCH 2/3] psql: order a relation's extended statistics by name..

like \dX

not by OID

v15 only?
---
 src/bin/psql/describe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index b3fbff1a0c3..daf06bbcc81 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2608,7 +2608,7 @@ describeOneTableDetails(const char *schemaname,
 			printfPQExpBuffer(,
 			  "SELECT oid, "
 			  "stxrelid::pg_catalog.regclass, "
-			  "stxnamespace::pg_catalog.regnamespace AS nsp, "
+			  "stxnamespace::pg_catalog.regnamespace::pg_catalog.text AS nsp, "
 			  "stxname,\n"
 			  "pg_catalog.pg_get_statisticsobjdef_columns(oid) AS columns,\n"
 			  "  'd' = any(stxkind) AS ndist_enabled,\n"
@@ -2617,7 +2617,7 @@ describeOneTableDetails(const char *schemaname,
 			  "stxstattarget\n"
 			  "FROM pg_catalog.pg_statistic_ext stat\n"
 			  "WHERE stxrelid = '%s'\n"
-			  "ORDER BY 1;",
+			  "ORDER BY nsp, stxname;",
 			  oid);
 
 			result = PSQLexec(buf.data);
-- 
2.17.1

>From 020f425c7ed67183275d64f19c34f69d63b8ddc6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 7 Jan 2022 15:48:46 -0600
Subject: [PATCH 3/3] Remove unused table alias

---
 src/bin/psql/describe.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index daf06bbcc81..c62d3bd4f8a 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2615,7 +2615,7 @@ describeOneTableDetails(const char *schemaname,
 			  "  'f' = any(stxkind) AS deps_enabled,\n"
 			  "  'm' = any(stxkind) AS mcv_enabled,\n"
 			  "stxstattarget\n"
-			  "FROM pg_catalog.pg_statistic_ext stat\n"
+			  "FROM pg_catalog.pg_statistic_ext\n"
 			  "WHERE stxrelid = '%s'\n"
 			  "ORDER BY nsp, stxname;",
 			  oid);
@@ -2719,7 +2719,7 @@ describeOneTableDetails(const char *schemaname,
 appendPQExpBufferStr(, "  stxstattarget\n");
 			else
 appendPQExpBufferStr(, "  -1 AS stxstattarget\n");
-			appendPQExpBuffer(, "FROM pg_catalog.pg_statistic_ext stat\n"
+			appendPQExpBuffer(, "FROM pg_catalog.pg_statistic_ext\n"
 			  "WHERE stxrelid = '%s'\n"
 			  "ORDER BY 1;",
 			  oid);
-- 
2.17.1

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Robert Haas

On Wed, Jan 5, 2022 at 9:53 PM Alexander Korotkov  wrote:
> I see at least the following major issues/questions in this patch.
> 1) Current code relies on the fact that TransactionId can be
> atomically read from/written to shared memory.  With 32-bit systems
> and 64-bit TransactionId, that's not true anymore.  Therefore, the
> patch has concurrency issues on 32-bit systems.  We need to carefully
> review these issues and provide a fallback for 32-bit systems.  I
> suppose nobody is thinking about dropping off 32-bit systems, right?

I think that's right. Not yet, anyway.

> Also, I wonder how critical for us is the overhead for 32-bit systems.
> They are going to become legacy, so overhead isn't so critical, right?

Agreed.

> 2) With this patch we still need to freeze to cut SLRUs.  This is
> especially problematic with Multixacts, because systems heavily using
> row-level locks can consume an enormous amount of multixacts.  That is
> especially problematic when we have 2x bigger multixacts.  We probably
> can provide an alternative implementation for multixact vacuum, which
> doesn't require scanning all the heaps.  That is a pretty amount of
> work though.  The clog is much smaller and we can cut it more rarely.
> Perhaps, we could tolerate freezing to cut clog, couldn't we?

Right. We can't let any of the SLRUs -- don't forget about stuff like
pg_subtrans, which is a multiple of the size of clog -- grow without
bound, even if it never forces a system shutdown. I'm not sure it's a
good idea to think about introducing new freezing mechanisms at the
same time as we're making other changes, though. Just removing the
possibility of a wraparound shutdown without changing any of the rules
about how and when we freeze would be a significant advancement. Other
changes could be left for future work.

> 3) 2x bigger in-memory representation of TransactionId have overheads.
> In particular, it could mitigate the effect of recent advancements
> from Andres Freund.  I'm not exactly sure we should/can do something
> with this.  But I think this should be at least carefully analyzed.

Seems fair.

> 4) SP-GiST index stores TransactionId on pages.  Could we tolerate
> dropping SP-GiST indexes on a major upgrade?  Or do we have to invent
> something smarter?

Probably depends on how much work it is. SP-GiST indexes are not
mainstream, so I think we could at least consider breaking
compatibility, but it doesn't seem like a thing to do lightly.

> 5) 32-bit limitation within the page mentioned upthread by Stephen
> Frost should be also carefully considered.  Ideally, we should get rid
> of it, but I don't have particular ideas in this field for now.  At
> least, we should make sure we did our best at error reporting and
> monitoring capabilities.

I don't think I understand the thinking here. As long as we retain the
existing limit that the oldest running XID can't be more than 2
billion XIDs in the past, we can't ever need to throw an error. A new
page modification that finds very old XIDs on the page can always
escape trouble by pruning the page and freezing whatever old XIDs
survive pruning.

I would argue that it's smarter not to change the in-memory
representation of XIDs to 64-bit in places like the ProcArray. As you
mention in (4), that might hurt performance. But also, the benefit is
minimal. Nobody is really sad that they can't keep transactions open
forever. They are sad because the system has severe bloat and/or shuts
down entirely. Some kind of change along these lines can fix the
second of those problems, and that's progress.

> I think the realistic goal for PG 15 development cycle would be
> agreement on a roadmap for all the items above (and probably some
> initial implementations).

+1. Trying to rush something through to commit is just going to result
in a bunch of bugs. We need to work through the issues carefully and
take the time to do it well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: libpq compression (part 2)

2022-01-07 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Jan  7, 2022 at 03:56:39PM -0500, Stephen Frost wrote:
> > As for the details of how we allow control over it, I suppose there's a
> > number of options.  Having it in the HBA doesn't seem terrible, though I
> > suspect most will just want to enable it across the board and having to
> > have "compression=allowed" or whatever added to every hba line seems
> > likely to be annoying.  Maybe a global GUC and then allow the hba to
> > override?
> 
> Ewe, I would like to avoid going in the GUC & pg_hba.conf direction,
> unless we have other cases where we already do this.

I mean, not exactly the same, but ... ssl?

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Robert Haas

On Thu, Jan 6, 2022 at 3:45 PM Peter Geoghegan  wrote:
> On Tue, Jan 4, 2022 at 9:40 PM Fujii Masao  
> wrote:
> > Could you tell me what happens if new tuple with XID larger than xid_base + 
> > 0x is inserted into the page? Such new tuple is not allowed to be 
> > inserted into that page?
>
> I fear that this patch will have many bugs along these lines. Example:
> Why is it okay that convert_page() may have to defragment a heap page,
> without holding a cleanup lock? That will subtly break code that holds
> a pin on the buffer, when a tuple slot contains a C pointer to a
> HeapTuple in shared memory (though only if we get unlucky).

Yeah. I think it's possible that some approach along the lines of what
is proposed here can work, but quality of implementation is a big
issue. This stuff is not easy to get right. Another thing that I'm
wondering about is the "double xmax" representation. That requires
some way of distinguishing when that representation is in use. I'd be
curious to know where we found the bits for that -- the tuple header
isn't exactly replete with extra bit space.

Also, if we have an epoch of some sort that is included in new page
headers but not old ones, that adds branches to code that might
sometimes be quite hot. I don't know how much of a problem that is,
but it seems worth worrying about.

For all of that, I don't particularly agree with Jim Finnerty's idea
that we ought to solve the problem by forcing sufficient space to
exist in the page pre-upgrade. There are some advantages to such
approaches, but they make it really hard to roll out changes. You have
to roll out the enabling change first, wait until everyone is running
a release that supports it, and only then release the technology that
requires the additional page space. Since we don't put new features
into back-branches -- and an on-disk format change would be a poor
place to start -- that would make rolling something like this out take
many years. I think we'll be much happier putting all the complexity
in the new release.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: libpq compression (part 2)

2022-01-07 Thread Bruce Momjian

On Fri, Jan  7, 2022 at 03:56:39PM -0500, Stephen Frost wrote:
> As for the details of how we allow control over it, I suppose there's a
> number of options.  Having it in the HBA doesn't seem terrible, though I
> suspect most will just want to enable it across the board and having to
> have "compression=allowed" or whatever added to every hba line seems
> likely to be annoying.  Maybe a global GUC and then allow the hba to
> override?

Ewe, I would like to avoid going in the GUC & pg_hba.conf direction,
unless we have other cases where we already do this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: libpq compression (part 2)

2022-01-07 Thread Stephen Frost

Greetings,

* Justin Pryzby (pry...@telsasoft.com) wrote:
> On Fri, Jan 07, 2022 at 01:46:00PM -0500, Bruce Momjian wrote:
> > On Sat, Jan  1, 2022 at 11:25:05AM -0600, Justin Pryzby wrote:
> > > Thanks for working on this.  The patch looks to be in good shape - I hope 
> > > more
> > > people will help to review and test it.  I took the liberty of creating a 
> > > new
> > > CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html
> > > 
> > > +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
> > > +{
> > > +   return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;
> > > 
> > > I think this is backwards , and should say != -1 ?
> > > 
> > > As written, the server GUC libpq_compression defaults to "on", and the 
> > > client
> > > doesn't request compression.  I think the server GUC should default to 
> > > off.
> > > I failed to convince Kontantin about this last year.  The reason is that 
> > > 1)
> > > it's a new feature; 2) with security implications.  An admin should need 
> > > to
> > > "opt in" to this.  I still wonder if this should be controlled by a new 
> > > "TYPE"
> > > in pg_hba (rather than a GUC); that would make it exclusive of SSL.
> > 
> > I assume this compression happens before it is encrypted for TLS
> > transport.  Second, compression was removed from TLS because there were
> > too many ways for HTTP to weaken encryption.  I assume the Postgres wire
> > protocol doesn't have similar exploit possibilities.
> 
> It's discussed in last year's thread.  The thinking is that there tends to be
> *fewer* exploitable opportunities between application->DB than between
> browser->app.

Yes, this was discussed previously and addressed.

> But it's still a known concern, and should default to off - as I said.

I'm not entirely convinced of this but also am happy enough as long as
the capability exists, no matter if it's off or on by default.

> That's also why I wondered if compression should be controlled by pg_hba,
> rather than a GUC.  To require/allow an DBA to opt-in to it for specific 
> hosts.
> Or to make it exclusive of ssl.  We could choose to not suppose that case at
> all, or (depending on the implement) refuse that combination of layers.

I'm definitely against us deciding that we know better than admins if
this is an acceptable trade-off in their environment, or not.  Letting
users/admins control it is fine, but I don't feel we should forbid it.

As for the details of how we allow control over it, I suppose there's a
number of options.  Having it in the HBA doesn't seem terrible, though I
suspect most will just want to enable it across the board and having to
have "compression=allowed" or whatever added to every hba line seems
likely to be annoying.  Maybe a global GUC and then allow the hba to
override?

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: make MaxBackends available in _PG_init

2022-01-07 Thread Robert Haas

On Fri, Jan 7, 2022 at 1:09 PM Bossart, Nathan  wrote:
> I wouldn't be surprised to learn of other cases, but I've only
> encountered this specific issue with MaxBackends.  I think MaxBackends
> is notable because it is more likely to be used by preloaded libraries
> but is intentionally initialized after loading them.  As noted in
> an earlier message on this thread [0], using MaxBackends in a call to
> RequestAddinShmemSpace() in _PG_init() may not reliably cause
> problems, too.

Yes, I think MaxBackends is a particularly severe case. I've seen this
problem with that GUC multiple times, and never with any other one.

> > It seems overkill to remove "extern" from MaxBackends and replace 
> > MaxBackends with GetMaxBackends() in the existing PostgreSQL codes. I'm not 
> > sure how much it's actually worth doing that.  Instead, isn't it enough to 
> > just add the comment like "Use GetMaxBackends() if you want to treat the 
> > lookup for uninitialized MaxBackends as an error" in the definition of 
> > MaxBackends?
>
> While that approach would provide a way to safely retrieve the value,
> I think it would do little to prevent the issue in practice.  If the
> size of the patch is a concern, we could also convert MaxBackends into
> a macro for calling GetMaxBackends().  This could also be a nice way
> to avoid breaking extensions that are using it correctly while
> triggering ERRORs for extensions that are using it incorrectly.  I've
> attached a new version of the patch that does it this way.

That's too magical for my taste.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-01-07 Thread Robert Haas

On Thu, Jan 6, 2022 at 5:46 PM Peter Geoghegan  wrote:
> One obvious reason for this is that the opportunistic freezing stuff
> is expected to be the thing that usually forces freezing -- not
> vacuum_freeze_min_age, nor FreezeLimit, nor any other XID-based
> cutoff. As you more or less pointed out yourself, we still need
> FreezeLimit as a backstop mechanism. But the value of FreezeLimit can
> just come from autovacuum_freeze_max_age/2 in all cases (no separate
> GUC), or something along those lines. We don't particularly expect the
> value of FreezeLimit to matter, at least most of the time. It should
> only noticeably affect our behavior during anti-wraparound VACUUMs,
> which become rare with the patch (e.g. my pgbench_accounts example
> upthread). Most individual tables will never get even one
> anti-wraparound VACUUM -- it just doesn't ever come for most tables in
> practice.

This seems like a weak argument. Sure, you COULD hard-code the limit
to be autovacuum_freeze_max_age/2 rather than making it a separate
tunable, but I don't think it's better. I am generally very skeptical
about the idea of using the same GUC value for multiple purposes,
because it often turns out that the optimal value for one purpose is
different than the optimal value for some other purpose. For example,
the optimal amount of memory for a hash table is likely different than
the optimal amount for a sort, which is why we now have
hash_mem_multiplier. When it's not even the same value that's being
used in both places, but the original value in one place and a value
derived from some formula in the other, the chances of things working
out are even less.

I feel generally that a lot of the argument you're making here
supposes that tables are going to get vacuumed regularly. I agree that
IF tables are being vacuumed on a regular basis, and if as part of
that we always push relfrozenxid forward as far as we can, we will
rarely have a situation where aggressive strategies to avoid
wraparound are required. However, I disagree strongly with the idea
that we can assume that tables will get vacuumed regularly. That can
fail to happen for all sorts of reasons. One of the common ones is a
poor choice of autovacuum configuration. The most common problem in my
experience is a cost limit that is too low to permit the amount of
vacuuming that is actually required, but other kinds of problems like
not enough workers (so tables get starved), too many workers (so the
cost limit is being shared between many processes), autovacuum=off
either globally or on one table (because of ... reasons),
autovacuum_vacuum_insert_threshold = -1 plus not many updates (so
thing ever triggers the vacuum), autovacuum_naptime=1d (actually seen
in the real world! ... and, no, it didn't work well), or stats
collector problems are all possible. We can *hope* that there are
going to be regular vacuums of the table long before wraparound
becomes a danger, but realistically, we better not assume that in our
choice of algorithms, because the real world is a messy place where
all sorts of crazy things happen.

Now, I agree with you in part: I don't think it's obvious that it's
useful to tune vacuum_freeze_table_age. When I advise customers on how
to fix vacuum problems, I am usually telling them to increase
autovacuum_vacuum_cost_limit, possibly also with an increase in
autovacuum_workers; or to increase or decrease
autovacuum_freeze_max_age depending on which problem they have; or
occasionally to adjust settings like autovacuum_naptime. It doesn't
often seem to be necessary to change vacuum_freeze_table_age or, for
that matter, vacuum_freeze_min_age. But if we remove them and then
discover scenarios where tuning them would have been useful, we'll
have no options for fixing PostgreSQL systems in the field. Waiting
for the next major release in such a scenario, or even the next minor
release, is not good. We should be VERY conservative about removing
existing settings if there's any chance that somebody could use them
to tune their way out of trouble.

> My big issue with vacuum_freeze_min_age is that it doesn't really work
> with the freeze map work in 9.6, which creates problems that I'm
> trying to address by freezing early and so on. After all, HEAD (and
> all stable branches) can easily set a page to all-visible (but not
> all-frozen) in the VM, meaning that the page's tuples won't be
> considered for freezing until the next aggressive VACUUM. This means
> that vacuum_freeze_min_age is already frequently ignored by the
> implementation -- it's conditioned on other things that are practically
> impossible to predict.
>
> Curious about your thoughts on this existing issue with
> vacuum_freeze_min_age. I am concerned about the "freezing cliff" that
> it creates.

So, let's see: if we see a page where the tuples are all-visible and
we seize the opportunity to freeze it, we can spare ourselves the need
to ever visit that page again (unless it gets modified). But if we
only

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Andres Freund

Hi,

On 2022-01-06 23:24:40 -0800, Jeff Davis wrote:
> It feels like the sync quorum should always be ahead of the async
> replicas. Unless I'm missing a use case, or there is some kind of
> performance gotcha.

I don't see how it can *not* cause a major performance / latency
gotcha. You're deliberately delaying replication after all?

Synchronous replication doesn't guarantee *anything* about the ability for to
fail over for other replicas. Nor would it after what's proposed here -
another sync replica would still not be guaranteed to be able to follow the
newly promoted primary.

To me this just sounds like trying to shoehorn something into syncrep that
it's not made for.

Greetings,

Andres Freund

Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2022-01-07 Thread Bossart, Nathan

On 11/23/21, 11:29 AM, "Thomas Munro"  wrote:
> Here's a patch for Linux and also FreeBSD.  The latter OS decided to
> turn on ASLR by default recently, causing my workstation to fail like
> this quite reliably, which reminded me to follow up with this.  It
> disables ASLR in pg_ctl and pg_regress, which is enough for check and
> check-world, but doesn't help you if you run the server directly
> (unlike the different hack done for macOS).
>
> For whatever random reason the failures are rarer on Linux (could be
> my imagination, but I think they might be clustered, I didn't look
> into the recipe for the randomness), but even without reproducing a
> failure it's clear to see using pmap that this has the right effect.
> I didn't bother with a check for the existence of ADDR_NO_RANDOMIZE
> because it's since 2.6.12 which is definitely ancient enough.

FWIW I just found this patch very useful for testing some EXEC_BACKEND
stuff on Linux.

Nathan

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Jeff Davis

On Sat, 2022-01-08 at 00:13 +0530, Bharath Rupireddy wrote:
> If there are long running txns on the primary and the async standbys
> were to wait until quorum commit from sync standbys, won't they fall
> behind the primary by too much?

No, because replication is based on LSNs, not transactions.

With the proposed change: an LSN can be replicated to all sync replicas
as soon as it's durable on the primary; and an LSN can be replicated to
all async replicas as soon as it's durable on the primary *and* the
sync rep quorum is satisfied.

Regards,
Jeff Davis

Re: libpq compression (part 2)

2022-01-07 Thread Justin Pryzby

On Fri, Jan 07, 2022 at 01:46:00PM -0500, Bruce Momjian wrote:
> On Sat, Jan  1, 2022 at 11:25:05AM -0600, Justin Pryzby wrote:
> > Thanks for working on this.  The patch looks to be in good shape - I hope 
> > more
> > people will help to review and test it.  I took the liberty of creating a 
> > new
> > CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html
> > 
> > +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
> > +{
> > +   return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;
> > 
> > I think this is backwards , and should say != -1 ?
> > 
> > As written, the server GUC libpq_compression defaults to "on", and the 
> > client
> > doesn't request compression.  I think the server GUC should default to off.
> > I failed to convince Kontantin about this last year.  The reason is that 1)
> > it's a new feature; 2) with security implications.  An admin should need to
> > "opt in" to this.  I still wonder if this should be controlled by a new 
> > "TYPE"
> > in pg_hba (rather than a GUC); that would make it exclusive of SSL.
> 
> I assume this compression happens before it is encrypted for TLS
> transport.  Second, compression was removed from TLS because there were
> too many ways for HTTP to weaken encryption.  I assume the Postgres wire
> protocol doesn't have similar exploit possibilities.

It's discussed in last year's thread.  The thinking is that there tends to be
*fewer* exploitable opportunities between application->DB than between
browser->app.

But it's still a known concern, and should default to off - as I said.

That's also why I wondered if compression should be controlled by pg_hba,
rather than a GUC.  To require/allow an DBA to opt-in to it for specific hosts.
Or to make it exclusive of ssl.  We could choose to not suppose that case at
all, or (depending on the implement) refuse that combination of layers.

-- 
Justin

Re: libpq compression (part 2)

2022-01-07 Thread Bruce Momjian

On Sat, Jan  1, 2022 at 11:25:05AM -0600, Justin Pryzby wrote:
> Thanks for working on this.  The patch looks to be in good shape - I hope more
> people will help to review and test it.  I took the liberty of creating a new
> CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html
> 
> +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len)
> +{
> +   return zpq_choose_compressor(zpq, msg_type, msg_len) == -1;
> 
> I think this is backwards , and should say != -1 ?
> 
> As written, the server GUC libpq_compression defaults to "on", and the client
> doesn't request compression.  I think the server GUC should default to off.
> I failed to convince Kontantin about this last year.  The reason is that 1)
> it's a new feature; 2) with security implications.  An admin should need to
> "opt in" to this.  I still wonder if this should be controlled by a new "TYPE"
> in pg_hba (rather than a GUC); that would make it exclusive of SSL.

I assume this compression happens before it is encrypted for TLS
transport.  Second, compression was removed from TLS because there were
too many ways for HTTP to weaken encryption.  I assume the Postgres wire
protocol doesn't have similar exploit possibilities.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Bharath Rupireddy

On Fri, Jan 7, 2022 at 12:54 PM Jeff Davis  wrote:
>
> On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> > I would like to propose a GUC send_Wal_after_quorum_committed which
> > when set to ON, walsenders corresponds to async standbys and logical
> > replication workers wait until the LSN is quorum committed on the
> > primary before sending it to the standby. This not only simplifies
> > the post failover steps but avoids unnecessary downtime for the async
> > replicas. Thoughts?
>
> Do we need a GUC? Or should we just always require that sync rep is
> satisfied before sending to async replicas?
>
> It feels like the sync quorum should always be ahead of the async
> replicas. Unless I'm missing a use case, or there is some kind of
> performance gotcha.

IMO, having GUC is a reasonable choice because some users might be
okay with it if their async replicas are ahead of the sync ones or
they would have dealt with this problem already in their HA solutions
or they don't want their async replicas to fall behind by the primary
(most of the times).

If there are long running txns on the primary and the async standbys
were to wait until quorum commit from sync standbys, won't they fall
behind the primary by too much? This isn't a problem at all if we
think from the perspective that async replicas are anyways prone to
falling behind by the primary. But, if the primary is having long
running txns continuously, the async replicas would eventually fall
behind more and more. Is there a way we can send the WAL records to
both sync and async replicas together but the async replicas won't
apply those WAL records until primary tells the standbys that quorum
commit is obtained? If the quorum commit isn't obtained by the
primary, the async replicas can ignore to apply the WAL records and
discard them.

Regards,
Bharath Rupireddy.

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Bossart, Nathan

On 1/6/22, 11:25 PM, "Jeff Davis"  wrote:
> On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
>> I would like to propose a GUC send_Wal_after_quorum_committed which
>> when set to ON, walsenders corresponds to async standbys and logical
>> replication workers wait until the LSN is quorum committed on the
>> primary before sending it to the standby. This not only simplifies
>> the post failover steps but avoids unnecessary downtime for the async
>> replicas. Thoughts?
>
> Do we need a GUC? Or should we just always require that sync rep is
> satisfied before sending to async replicas?
>
> It feels like the sync quorum should always be ahead of the async
> replicas. Unless I'm missing a use case, or there is some kind of
> performance gotcha.

I don't have a strong opinion on whether there needs to be a GUC, but
+1 for the ability to enforce sync quorum before sending WAL to async
standbys.  I think that would be a reasonable default behavior.

Nathan

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-01-07 Thread Bossart, Nathan

On 1/7/22, 5:52 AM, "David Steele"  wrote:
> On 1/6/22 20:20, Euler Taveira wrote:
>> On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:
>>> After a quick glance, I didn't see an easy way to hold a session open
>>> while the test does other things.  If there isn't one, modifying
>>> backup_fs_hot() to work with non-exclusive mode might be more trouble
>>> than it is worth.
>>
>> You can use IPC::Run to start psql in background. See examples in
>> src/test/recovery.
>
> I don't think updating backup_fs_hot() is worth it here.
>
> backup_fs_cold() works just fine for this case and if there is a need
> for backup_fs_hot() in the future it can be implemented as needed.

Thanks for the pointer on IPC::Run.  I had a feeling I was missing
something obvious!

I think I agree with David that it still isn't worth it for just this
one test.  Of course, it would be great to test the non-exclusive
backup logic as much as possible, but I'm not sure that this
particular test will provide any sort of meaningful coverage.

Nathan

Re: row filtering for logical replication

2022-01-07 Thread Euler Taveira

On Fri, Jan 7, 2022, at 6:05 AM, Amit Kapila wrote:
> Euler, I have one more question about this patch for you. I see that
> in the patch we are calling coerce_to_target_type() in
> pgoutput_row_filter_init_expr() but do we really need the same? We
> already do that via
> transformPubWhereClauses->transformWhereClause->coerce_to_boolean
> before storing where clause expression. It is not clear to me why that
> is required? We might want to add a comment if that is required.
It is redundant. It seems an additional safeguard that we should be removed.
Good catch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread SATYANARAYANA NARLAPURAM

On Fri, Jan 7, 2022 at 12:27 AM Kyotaro Horiguchi 
wrote:

> At Thu, 6 Jan 2022 23:55:01 -0800, SATYANARAYANA NARLAPURAM <
> satyanarlapu...@gmail.com> wrote in
> > On Thu, Jan 6, 2022 at 11:24 PM Jeff Davis  wrote:
> >
> > > On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> > > > I would like to propose a GUC send_Wal_after_quorum_committed which
> > > > when set to ON, walsenders corresponds to async standbys and logical
> > > > replication workers wait until the LSN is quorum committed on the
> > > > primary before sending it to the standby. This not only simplifies
> > > > the post failover steps but avoids unnecessary downtime for the async
> > > > replicas. Thoughts?
> > >
> > > Do we need a GUC? Or should we just always require that sync rep is
> > > satisfied before sending to async replicas?
> > >
> >
> > I proposed a GUC to not introduce a behavior change by default. I have no
> > strong opinion on having a GUC or making the proposed behavior default,
> > would love to get others' perspectives as well.
> >
> >
> > >
> > > It feels like the sync quorum should always be ahead of the async
> > > replicas. Unless I'm missing a use case, or there is some kind of
> > > performance gotcha.
> > >
> >
> > I couldn't think of a case that can cause serious performance issues but
> > will run some experiments on this and post the numbers.
>
> I think Jeff is saying that "quorum commit" already by definition
> means that all out-of-quorum standbys are behind of the
> quorum-standbys.  I agree to that in a dictionary sense. But I can
> think of the case where the response from the top-runner standby
> vanishes or gets caught somewhere on network for some reason. In that
> case the primary happily checks quorum ignoring the top-runner.
>
> To avoid that misdecision, I can guess two possible "solutions".
>
> One is to serialize WAL sending (of course it is unacceptable at all)
> or aotehr is to send WAL to all standbys at once then make the
> decision after making sure receiving replies from all standbys (this
> is no longer quorum commit in another sense..)
>

There is no need to serialize sending the WAL among sync standbys. The only
serialization required is first to all the sync replicas and then to sync
replicas if any. Once an LSN is quorum committed, no failover subsystem
initiates an automatic failover such that the LSN is lost (data loss)

>
> So I'm afraid that there's no sensible solution to avoid the
> hiding-forerunner problem on quorum commit.
>

Could you elaborate on the problem here?


>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Re: Python Plain Text Sender

2022-01-07 Thread Justin Pryzby

On Fri, Jan 07, 2022 at 03:48:24PM +0300, Ali Koca wrote:
> to Dear Hackers,
> 
> I decided to we need basic e-mail sender in command line when coding
> somethings and asking one-line questions.

Why ?  Are there mail clients for which this is hard to do ?

I don't think a custom MUA/mail client is something we should implement nor
maintain.  Unless it does something better than other mail clients (and doesn't
do anything worse than most).

I think this is mostly an issue of user awareness.  Providing a script to do
this won't help users who don't realize that their MUA is sending only HTML
gobbledygook.

BTW, why did you send the same email again 5 hours later ?

-- 
Justin

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Simon Riggs

On Fri, 7 Jan 2022 at 16:09, Justin Pryzby  wrote:
>
> On Fri, Jan 07, 2022 at 03:53:51PM +, Finnerty, Jim wrote:
> > I'd still like a plan to retire the "double xmax" representation 
> > eventually.  Previously I suggested that this could be done as a 
> > post-process, before upgrade is complete, but that could potentially make 
> > upgrade very slow.
> >
> > Another way to retire the "double xmax" representation eventually could be 
> > to disallow "double xmax" pages in subsequent major version upgrades (e.g. 
> > to PG16, if "double xmax" pages are introduced in PG15).  This gives the 
> > luxury of time after a fast upgrade to convert all pages to contain the 
> > epochs, while still providing a path to more maintainable code in the 
> > future.
>
> Yes, but how are you planning to rewrite it?  Is vacuum enough?

Probably not, but VACUUM is the place to add such code.

> I suppose it'd need FREEZE + DISABLE_PAGE_SKIPPING ?

Yes

> This would preclude upgrading "across" v15.  Maybe that'd be okay, but it'd be
> a new and atypical restriction.

I don't see that restriction. Anyone upgrading from before PG15 would
apply the transform. Just because we introduce a transform in PG15
doesn't mean we can't apply that transform in later releases as well,
to allow say PG14 -> PG16.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: ICU for global collation

2022-01-07 Thread Daniel Verite

Julien Rouhaud wrote:

> If you want a database with an ICU default collation the lc_collate
> and lc_ctype should inherit what's in the template database and not
> what was provided in the LOCALE I think.  You could still probably
> overload them in some scenario, but without a list of what isn't
> ICU-aware I can't really be sure of how often one might have to do
> it.

I guess we'd need that when creating a database with a different
encoding than the template databases, at least.

About what's not ICU-aware, I believe the most significant part in
core is the Full Text Search parser.
It doesn't care about sorting strings, but it relies on the functions
from POSIX , which depend on LC_CTYPE
(it looks however that this could be improved by following
what has been done in backend/regex/regc_pg_locale.c, which has
comparable needs and calls ICU functions when applicable).

Also, any extension is potentially concerned. Surely many
extensions call functions from ctype.h assuming that
the current value of LC_CTYPE works with the data they handle.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

[no subject]

2022-01-07 Thread Ali Koca


to Dear Hackers,

I decided to we need basic e-mail sender in command line when coding 
somethings and asking one-line questions.


Ali KocaFrom 21be4e168fae83bfcd67b219d6e4234a48f0f4f9 Mon Sep 17 00:00:00 2001
From: Ali Koca 
Date: Fri, 7 Jan 2022 15:28:12 +0300
Subject: [PATCH] sending plain text e-mail

---
 src/tools/plain_text_mail.py | 26 ++
 1 file changed, 26 insertions(+)
 create mode 100644 src/tools/plain_text_mail.py

diff --git a/src/tools/plain_text_mail.py b/src/tools/plain_text_mail.py
new file mode 100644
index 00..8b36871874
--- /dev/null
+++ b/src/tools/plain_text_mail.py
@@ -0,0 +1,26 @@
+"""-
+***
+*
+* plain_text_mail.py
+*   - This tool sends plain text emails.
+*
+* Created by: Ali Koca
+* Copyright (c) 2017-2021, PostgreSQL Global Development Group
+*
+***-"""
+
+import smtplib, ssl
+import os
+
+port = 465
+smtp_server = os.getenv("SMTP_SERVER")
+sender_email = os.getenv("EMAIL_ADDRESS")
+
+receiver_email = input("Enter receiver email: ")
+password = input("Type your password and press enter: ")
+message = input("Enter your message: ")
+
+context = ssl.create_default_context()
+with smtplib.SMTP_SSL(smtp_server, port, context=context) as server:
+server.login(sender_email, password)
+server.sendmail(sender_email, receiver_email, message)
\ No newline at end of file
-- 
2.32.0.windows.1

Re: make MaxBackends available in _PG_init

2022-01-07 Thread Fujii Masao





On 2021/08/16 13:02, Bossart, Nathan wrote:

On 8/15/21, 1:05 AM, "wangsh.f...@fujitsu.com"  wrote:

I don't think calling function GetMaxBackends() in the for loop is a good idea.
How about use a temp variable to save the return value of function 
GetMaxBackends() ?


I did this in v4.  There may be a couple of remaining places that call
GetMaxBackends() several times, but the function should be relatively
inexpensive.


The patch handles only MaxBackends. But isn't there other variable having the 
same issue?

It seems overkill to remove "extern" from MaxBackends and replace MaxBackends with 
GetMaxBackends() in the existing PostgreSQL codes. I'm not sure how much it's actually worth doing 
that.  Instead, isn't it enough to just add the comment like "Use GetMaxBackends() if you want 
to treat the lookup for uninitialized MaxBackends as an error" in the definition of 
MaxBackends?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Bruce Momjian

On Fri, Jan 7, 2022 at 03:53:51PM +, Finnerty, Jim wrote:
> Re:  The "prepare" approach was the first tried.
> https://github.com/postgrespro/pg_pageprep But it appears to be
> very difficult and unreliable.  After investing many months into
> pg_pageprep, "double xmax" approach appears to be very fast to
> implement and reliable.
>
> I'd still like a plan to retire the "double xmax" representation
> eventually.  Previously I suggested that this could be done as a
> post-process, before upgrade is complete, but that could potentially
> make upgrade very slow.
>
> Another way to retire the "double xmax" representation eventually
> could be to disallow "double xmax" pages in subsequent major version
> upgrades (e.g. to PG16, if "double xmax" pages are introduced in
> PG15).  This gives the luxury of time after a fast upgrade to convert
> all pages to contain the epochs, while still providing a path to more
> maintainable code in the future.

This gets into the entire issue we have discussed in the past but never
resolved --- how do we manage state changes in the Postgres file format
while the server is running?  pg_upgrade and pg_checksums avoid the
problem by doing such changes while the server is down, and other file
formats have avoided it by allowing perpetual reading of the old format.

Any such non-perpetual changes while the server is running must deal
with recording the start of the state change, the completion of it,
communicating such state changes to all running backends in a
synchronous manner, and possible restarting of the state change.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes

2022-01-07 Thread Bharath Rupireddy

On Fri, Jan 7, 2022 at 9:19 PM Fujii Masao  wrote:
>
> On 2021/11/29 11:44, vignesh C wrote:
> > Thanks for the updated patch. The patch applies neatly, make
> > check-world passes and the documentation looks good. I did not find
> > any issues with the v6 patch, I'm marking the patch as  Ready for
> > Committer.
>
> I started reading the patch.

Thanks.

> +CREATE FUNCTION memcxt_get_proc_pid(text)
> +  RETURNS int
> +  LANGUAGE SQL
> +  AS 'SELECT pid FROM pg_stat_activity WHERE backend_type = $1';
> +
> +SELECT pg_log_backend_memory_contexts(memcxt_get_proc_pid('checkpointer'));
> +
> +DROP FUNCTION memcxt_get_proc_pid(text);
>
> Why is memcxt_get_proc_pid() still necessary? ISTM that we can just replace 
> the above with the following query, instead.
>
>  SELECT pg_log_backend_memory_contexts(pid) FROM pg_stat_activity WHERE 
> backend_type = 'checkpointer'

Changed.

> I'm tempted to replace these descriptions as follows. Because the following 
> looks simpler and easier to read and understand, to me.
> --
> Requests to log the memory contexts of the backend with the specified process 
> ID.  This function can send the request to also auxiliary processes except 
> logger and stats collector.
> --

Changed.

> As Horiguchi-san told upthread, IMO it's simpler not to use is_aux_proc flag. 
> For example, you can replace this code with
>
> 
> proc = BackendPidGetProc(pid);
>
> if (proc != NULL)
>backendId = proc->backendId;
> else
>proc = AuxiliaryPidGetProc(pid);
> 

Changed.

PSA v7 patch.

Regards,
Bharath Rupireddy.


v7-0001-enhance-pg_log_backend_memory_contexts-to-log-mem.patch
Description: Binary data

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Peter Eisentraut


On 07.01.22 06:18, Fujii Masao wrote:

On 2022/01/06 19:24, Simon Riggs wrote:

On Thu, 30 Dec 2021 at 13:19, Maxim Orlov  wrote:


Your opinions are very much welcome!


This is a review of the Int64 options patch,
"v6-0001-Add-64-bit-GUCs-for-xids.patch"


Do we really need to support both int32 and int64 options? Isn't it 
enough to replace the existing int32 option with int64 one?


I think that would create a lot of problems.  You'd have to change every 
underlying int variable to int64, and then check whether that causes any 
issues where they are used (wrong printf format, assignments, 
overflows), and you'd have to check whether the existing limits are 
still appropriate.  And extensions would be upset.  This would be a big 
mess.


Or how about 
using string-type option for very large number like 64-bit XID, like 
it's done for recovery_target_xid?


Seeing how many variables that contain transaction ID information 
actually exist, I think it could be worth introducing a new category as 
proposed.  Otherwise, you'd have to write a lot of check and assign hooks.


I do wonder whether signed vs. unsigned is handled correctly. 
Transaction IDs are unsigned, but all GUC handling is signed.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Justin Pryzby

On Fri, Jan 07, 2022 at 03:53:51PM +, Finnerty, Jim wrote:
> I'd still like a plan to retire the "double xmax" representation eventually.  
> Previously I suggested that this could be done as a post-process, before 
> upgrade is complete, but that could potentially make upgrade very slow. 
> 
> Another way to retire the "double xmax" representation eventually could be to 
> disallow "double xmax" pages in subsequent major version upgrades (e.g. to 
> PG16, if "double xmax" pages are introduced in PG15).  This gives the luxury 
> of time after a fast upgrade to convert all pages to contain the epochs, 
> while still providing a path to more maintainable code in the future.

Yes, but how are you planning to rewrite it?  Is vacuum enough?
I suppose it'd need FREEZE + DISABLE_PAGE_SKIPPING ?

This would preclude upgrading "across" v15.  Maybe that'd be okay, but it'd be
a new and atypical restriction.

How would you enforce that it'd been run on v15 before upgrading to pg16 ?

You'd need to track whether vacuum had completed the necessary steps in pg15.
I don't think it'd be okay to make pg_upgrade --check to read every tuple.

The "keeping track" part is what reminds me of the online checksum patch.
It'd be ideal if there were a generic solution to this kind of task, or at
least a "model" process to follow.

-- 
Justin

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Finnerty, Jim

Re:The "prepare" approach was the first tried.
https://github.com/postgrespro/pg_pageprep
But it appears to be very difficult and unreliable.  After investing
many months into pg_pageprep, "double xmax" approach appears to be
very fast to implement and reliable.

I'd still like a plan to retire the "double xmax" representation eventually.  
Previously I suggested that this could be done as a post-process, before 
upgrade is complete, but that could potentially make upgrade very slow. 

Another way to retire the "double xmax" representation eventually could be to 
disallow "double xmax" pages in subsequent major version upgrades (e.g. to 
PG16, if "double xmax" pages are introduced in PG15).  This gives the luxury of 
time after a fast upgrade to convert all pages to contain the epochs, while 
still providing a path to more maintainable code in the future.

Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes

2022-01-07 Thread Fujii Masao





On 2021/11/29 11:44, vignesh C wrote:

Thanks for the updated patch. The patch applies neatly, make
check-world passes and the documentation looks good. I did not find
any issues with the v6 patch, I'm marking the patch as  Ready for
Committer.


I started reading the patch.

+CREATE FUNCTION memcxt_get_proc_pid(text)
+  RETURNS int
+  LANGUAGE SQL
+  AS 'SELECT pid FROM pg_stat_activity WHERE backend_type = $1';
+
+SELECT pg_log_backend_memory_contexts(memcxt_get_proc_pid('checkpointer'));
+
+DROP FUNCTION memcxt_get_proc_pid(text);

Why is memcxt_get_proc_pid() still necessary? ISTM that we can just replace the 
above with the following query, instead.

SELECT pg_log_backend_memory_contexts(pid) FROM pg_stat_activity WHERE 
backend_type = 'checkpointer'

-Requests to log the memory contexts of the backend with the
-specified process ID.  These memory contexts will be logged at
-LOG message level. They will appear in
-the server log based on the log configuration set
-(See  for more information),
-but will not be sent to the client regardless of
+Requests to log memory contexts of the backend
+or the WAL sender 
or
+the auxiliary 
process
+with the specified process ID. This function cannot request
+postmaster 
process or
+logger or
+statistics 
collector
+(all other auxiliary 
processes

ISTM that you're trying to list all possible processes that 
pg_log_backend_memory_contexts() can handle. But why didn't you list autovacuum worker 
(while other special backend, WAL sender, is picked up) and background worker like 
logical replication launcher? Because the term "backend" implicitly includes 
those processes? If so, why did you pick up WAL sender separately?

I'm tempted to replace these descriptions as follows. Because the following 
looks simpler and easier to read and understand, to me.

--
Requests to log the memory contexts of the process with the specified process 
ID.  Possible processes that this function can send the request to are: 
backend, WAL sender, autovacuum worker, auxiliary processes except logger and 
stats collector, and background workers.
--

or

--
Requests to log the memory contexts of the backend with the specified process 
ID.  This function can send the request to also auxiliary processes except 
logger and stats collector.
--

+   /* See if the process with given pid is an auxiliary process. */
+   if (proc == NULL)
+   {
+   proc = AuxiliaryPidGetProc(pid);
+   is_aux_proc = true;
+   }

As Horiguchi-san told upthread, IMO it's simpler not to use is_aux_proc flag. 
For example, you can replace this code with


proc = BackendPidGetProc(pid);

if (proc != NULL)
  backendId = proc->backendId;
else
  proc = AuxiliaryPidGetProc(pid);



Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Finnerty, Jim

Re: clog page numbers, as returned by TransactionIdToPage

-   int pageno = TransactionIdToPage(xid);  /* get 
page of parent */
+   int64   pageno = TransactionIdToPage(xid);  /* get page of 
parent */

...

-   int pageno = TransactionIdToPage(subxids[0]);
+   int64   pageno = TransactionIdToPage(subxids[0]);
int offset = 0;
int i = 0;
 
...

-   int nextpageno;
+   int64   nextpageno;

Etc.

In all those places where you are replacing int with int64 for the kind of 
values returned by TransactionIdToPage(), would you mind replacing the int64's 
with a type name, such as ClogPageNumber, for improved code maintainability?

Re: CREATEROLE and role ownership hierarchies

2022-01-07 Thread Joshua Brindle

On Wed, Jan 5, 2022 at 7:05 PM Mark Dilger  wrote:

> > On Jan 4, 2022, at 12:47 PM, Joshua Brindle 
> >  wrote:
> >
> >> I was able to reproduce that using REASSIGN OWNED BY to cause a user to 
> >> own itself.  Is that how you did it, or is there yet another way to get 
> >> into that state?
> >
> > I did:
> > ALTER ROLE brindle OWNER TO brindle;
>
> Ok, thanks.  I have rebased, fixed both REASSIGN OWNED BY and ALTER ROLE .. 
> OWNER TO cases, and added regression coverage for them.
>
> The last patch set to contain significant changes was v2, with v3 just being 
> a rebase.  Relative to those sets:
>
> 0001 -- rebased.
> 0002 -- rebased; extend AlterRoleOwner_internal to disallow making a role its 
> own immediate owner.
> 0003 -- rebased; extend AlterRoleOwner_internal to disallow cycles in the 
> role ownership graph.
> 0004 -- rebased.
> 0005 -- new; removes the broken pg_auth_members.grantor field.
>

LGTM +1

Re: Fix vcregress plpython3 warning

2022-01-07 Thread Juan José Santamaría Flecha

On Fri, Jan 7, 2022 at 3:24 PM Andrew Dunstan  wrote:

>
> In that case, just this should work:
>
> s/EXTENSION (\S*?)plpython2?u/EXTENSION $1plpython3u/g ;
>
> LGTM.

Regards,

Juan José Santamaría Flecha

Re: pgsql: Refactor tar method of walmethods.c to rely on the compression m

2022-01-07 Thread Christoph Berg

Re: Michael Paquier
> Refactor tar method of walmethods.c to rely on the compression method

Hi,

since about this commit, pg_wal.tar is no longer compressed at all:

$ pg_basebackup -D foo --format=tar
$ ls -l foo/
-rw---  1 cbe cbe   137152  7. Jan 15:37 backup_manifest
-rw---  1 cbe cbe 23606272  7. Jan 15:37 base.tar
-rw---  1 cbe cbe 16778752  7. Jan 15:37 pg_wal.tar

$ pg_basebackup -D foogz --format=tar --gzip
$ ls -l foogz
-rw---  1 cbe cbe   137152  7. Jan 15:37 backup_manifest
-rw---  1 cbe cbe  3073257  7. Jan 15:37 base.tar.gz
-rw---  1 cbe cbe 16779264  7. Jan 15:37 pg_wal.tar <-- should be 
pg_wal.tar.gz

Christoph

Re: ICU for global collation

2022-01-07 Thread Peter Eisentraut


On 04.01.22 17:03, Peter Eisentraut wrote:
There are really a lot of places with this new code.  Maybe it could 
be some
new function/macro to wrap that for the normal case (e.g. not 
formatting.c)?


Right, we could just put this into pg_newlocale_from_collation(), but 
the comment there says


  * In fact, they shouldn't call this function at all when they are dealing
  * with the default locale.  That can save quite a bit in hotspots.

I don't know how to assess that.


I tested this a bit.  I used the following setup:

create table t1 (a text);
insert into t1 select md5(generate_series(1, 1000)::text);
select count(*) from t1 where a > '';

And then I changed in varstr_cmp():

if (collid != DEFAULT_COLLATION_OID)
mylocale = pg_newlocale_from_collation(collid);

to just

mylocale = pg_newlocale_from_collation(collid);

I find that the \timing results are indistinguishable.  (I used locale 
"en_US.UTF-8" and made sure that that code path is actually hit.)


Does anyone have other insights?

Re: Fix vcregress plpython3 warning

2022-01-07 Thread Andrew Dunstan



On 1/7/22 08:56, Juan José Santamaría Flecha wrote:
>
> On Fri, Jan 7, 2022 at 2:30 PM Andrew Dunstan  wrote:
>
>
> Yeah, this code is not a model of clarity though. I had to think
> through
> it and I write quite a bit of perl. I would probably write it
> something
> like this:
>
>
> s/EXTENSION (.*?)plpython2?u/EXTENSION $1plpython3u/g ;
>
> Yeah, I had to do some testing to figure it out. Based on
> what regress-python3-mangle.mk 
> does, I think it tries to ignore cases such as:
>
> DROP EXTENSION IF EXISTS plpython2u CASCADE;
>
> Which that expression would match. Maybe use a couple of lines as in
> the make file?
>
> s/EXTENSION plpython2?u/EXTENSION plpython3u/g
> s/EXTENSION ([^ ]*)_plpython2?u/EXTENSION \$1_plpython3u/g
>
>

In that case, just this should work:


s/EXTENSION (\S*?)plpython2?u/EXTENSION $1plpython3u/g ;


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Fix vcregress plpython3 warning

2022-01-07 Thread Juan José Santamaría Flecha

On Fri, Jan 7, 2022 at 2:56 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:
copy-paste

> s/EXTENSION plpython2?u/EXTENSION plpython3u/g
> s/EXTENSION ([^ ]*)_plpython2?u/EXTENSION $1_plpython3u/g
>

Re: Fix vcregress plpython3 warning

2022-01-07 Thread Juan José Santamaría Flecha

On Fri, Jan 7, 2022 at 2:30 PM Andrew Dunstan  wrote:

>
> Yeah, this code is not a model of clarity though. I had to think through
> it and I write quite a bit of perl. I would probably write it something
> like this:
>
>
> s/EXTENSION (.*?)plpython2?u/EXTENSION $1plpython3u/g ;
>
> Yeah, I had to do some testing to figure it out. Based on what
regress-python3-mangle.mk does, I think it tries to ignore cases such as:

DROP EXTENSION IF EXISTS plpython2u CASCADE;

Which that expression would match. Maybe use a couple of lines as in the
make file?

s/EXTENSION plpython2?u/EXTENSION plpython3u/g
s/EXTENSION ([^ ]*)_plpython2?u/EXTENSION \$1_plpython3u/g

Regards,

Juan José Santamaría Flecha

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-01-07 Thread David Steele


On 1/6/22 20:20, Euler Taveira wrote:

On Thu, Jan 6, 2022, at 9:48 PM, Bossart, Nathan wrote:

After a quick glance, I didn't see an easy way to hold a session open
while the test does other things.  If there isn't one, modifying
backup_fs_hot() to work with non-exclusive mode might be more trouble
than it is worth.

>

You can use IPC::Run to start psql in background. See examples in
src/test/recovery.


I don't think updating backup_fs_hot() is worth it here.

backup_fs_cold() works just fine for this case and if there is a need 
for backup_fs_hot() in the future it can be implemented as needed.


Regards,
-David

Re: row filtering for logical replication

2022-01-07 Thread Euler Taveira

On Fri, Jan 7, 2022, at 3:35 AM, Amit Kapila wrote:
> On Fri, Jan 7, 2022 at 9:44 AM Amit Kapila  wrote:
> >
> > On Thu, Jan 6, 2022 at 6:42 PM Euler Taveira  wrote:
> > >
> > > IMO we shouldn't reuse ReorderBufferChangeType. For a long-term solution, 
> > > it is
> > > fragile. ReorderBufferChangeType has values that do not matter for row 
> > > filter
> > > and it relies on the fact that REORDER_BUFFER_CHANGE_INSERT,
> > > REORDER_BUFFER_CHANGE_UPDATE and REORDER_BUFFER_CHANGE_DELETE are the 
> > > first 3
> > > values from the enum, otherwise, it breaks rfnodes and no_filters in
> > > pgoutput_row_filter().
> > >
> >
> > I think you mean to say it will break in pgoutput_row_filter_init(). I
> > see your point but OTOH, if we do what you are suggesting then don't
> > we need an additional mapping between ReorderBufferChangeType and
> > RowFilterPublishAction as row filter and pgoutput_change API need to
> > use those values.
> >
> 
> Can't we use 0,1,2 as indexes for rfnodes/no_filters based on change
> type as they are local variables as that will avoid the fragileness
> you are worried about. I am slightly hesitant to introduce new enum
> when we are already using reorder buffer change type in pgoutput.c.
WFM. I used numbers + comments in a previous patch set [1]. I suggested the enum
because each command would be self explanatory.

[1] 
https://www.postgresql.org/message-id/49ba49f1-8bdb-40b7-ae9e-f17d88b3afcd%40www.fastmail.com


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Fix vcregress plpython3 warning

2022-01-07 Thread Andrew Dunstan



On 1/7/22 07:20, Juan José Santamaría Flecha wrote:
> Hi,
>
> When running the plcheck in Windows we get the following warning, it
> is visible in the cfbots [1]:
>
> Use of uninitialized value $1 in concatenation (.) or string at
> src/tools/msvc/vcregress.pl  line 350.
>
> This points to mangle_plpython3 subroutine. The attached patch
> addresses the problem.
>
> [1] http://commitfest.cputube.org/


Yeah, this code is not a model of clarity though. I had to think through
it and I write quite a bit of perl. I would probably write it something
like this:


s/EXTENSION (.*?)plpython2?u/EXTENSION $1plpython3u/g ;


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: sqlsmith: ERROR: XX000: bogus varno: 2

2022-01-07 Thread Amit Langote

On Fri, Jan 7, 2022 at 12:24 AM Tom Lane  wrote:
> Amit Langote  writes:
> > Thanks for addressing that in the patch you posted.  I guess fixing
> > only expression_tree_walker/mutator() suffices for now, but curious to
> > know if it was intentional that you decided not to touch the following
> > sites:
>
> > exprCollation(): it would perhaps make sense to return the collation
> > assigned to the 1st element of listdatums/lowerdatums/upperdatums,
> > especially given that transformPartitionBoundValue() does assign a
> > collation to the values in those lists based on the parent's partition
> > key specification.
>
> But each column could have a different collation, no?  I do not
> think it's sensible to pick one of those at random and claim
> that's the collation of the whole thing.  So throwing an error
> seems appropriate.
>
> > exprType(): could be handled similarly
>
> The same, in spades.  Anybody who is asking for "the type"
> of a relpartbound is misguided.

Okay, agree there's no need for handling bound nodes in these
functions.  Most sites that need to see the collation/type OID for
bound datums work directly with the individual elements of those lists
anyway.

> > queryjumble.c: JumbleExpr(): whose header comment says:
>
> If somebody needs that, I wouldn't object to adding support there.
> But right now it would just be dead code, so why bother?

Sure, makes sense.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Column Filtering in Logical Replication

2022-01-07 Thread Amit Kapila

On Fri, Jan 7, 2022 at 5:16 PM Peter Eisentraut
 wrote:
>
> src/backend/commands/tablecmds.c:
>
> ATExecReplicaIdentity(): Regarding the question of how to handle
> REPLICA_IDENTITY_NOTHING: I see two ways to do this.  Right now, the
> approach is that the user can set the replica identity freely, and we
> decide later based on that what we can replicate (e.g., no updates).
>

+1. This is what we are trying to do with the row filter patch. It
seems Hou-San has also mentioned the same on this thread [1].

> For this patch, that would mean we don't restrict what columns can be
> in the column list, but we check what actions we can replicate based
> on the column list.  The alternative is that we require the column
> list to include the replica identity, as the patch is currently doing,
> which would mean that REPLICA_IDENTITY_NOTHING can be allowed since
> it's essentially a set of zero columns.
>
> I find the current behavior a bit weird on reflection.  If a user
> wants to replicate on some columns and only INSERTs, that should be
> allowed regardless of what the replica identity columns are.
>

Right, I also raised the same point [2] related to INSERTs.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716330FFE3803DF887D073C94789%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1%2BFoJ-J7wUG5s8zCtY0iBuN9LcjQcYhV4BD17xhuHfoug%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: RFC: Logging plan of the running query

2022-01-07 Thread Fujii Masao




On 2022/01/07 20:58, torikoshia wrote:

On 2022-01-07 14:30, torikoshia wrote:


Updated the patch for fixing compiler warning about the format on windows.


I got another compiler warning, updated the patch again.


Thanks for updating the patch!

I ran the following query every 0.1s by using \watch psql command from three 
different sessions while make installcheck test was running.

SELECT pg_log_query_plan(pid) FROM pg_stat_activity;


And then I got the segmentation fault as follows.

2022-01-07 21:40:32 JST [postmaster] LOG:  server process (PID 51017) was 
terminated by signal 11: Segmentation fault: 11
2022-01-07 21:40:32 JST [postmaster] DETAIL:  Failed process was running: 
select description, (test_conv(inbytes, 'utf8', 'utf8')).* from 
utf8_verification_inputs;
2022-01-07 21:40:32 JST [postmaster] LOG:  terminating any other active server 
processes
2022-01-07 21:40:32 JST [postmaster] LOG:  all server processes terminated; 
reinitializing


The backtrace I got from the core file was:

(lldb) target create --core "/cores/core.51017"
Core file '/cores/core.51017' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x7fff20484552 libsystem_platform.dylib`_platform_strlen + 18
frame #1: 0x00011076e36c postgres`dopr(target=0x7ffedfd1b1d8, 
format="\n", args=0x7ffedfd1b450) at snprintf.c:444:20
frame #2: 0x00011076e133 postgres`pg_vsnprintf(str="Query Text:

Re: Use generation context to speed up tuplesorts

2022-01-07 Thread Ronan Dunklau

Le vendredi 7 janvier 2022, 13:03:28 CET Tomas Vondra a écrit :
> On 1/7/22 12:03, Ronan Dunklau wrote:
> > Le vendredi 31 décembre 2021, 22:26:37 CET David Rowley a écrit :
> >> I've attached some benchmark results that I took recently.  The
> >> spreadsheet contains results from 3 versions. master, master + 0001 -
> >> 0002, then master + 0001 - 0003.  The 0003 patch makes the code a bit
> >> more conservative about the chunk sizes it allocates and also tries to
> >> allocate the tuple array according to the number of tuples we expect
> >> to be able to sort in a single batch for when the sort is not
> >> estimated to fit inside work_mem.
> > 
> > (Sorry for trying to merge back the discussion on the two sides of the
> > thread)
> > 
> > In  https://www.postgresql.org/message-id/4776839.iZASKD2KPV%40aivenronan,
> > I expressed the idea of being able to tune glibc's malloc behaviour.
> > 
> > I implemented that (patch 0001) to provide a new hook which is called on
> > backend startup, and anytime we set work_mem. This hook is # defined
> > depending on the malloc implementation: currently a default, no-op
> > implementation is provided as well as a glibc's malloc implementation.
> 
> Not sure I'd call this a hook - that usually means a way to plug-in
> custom code through a callback, and this is simply ifdefing a block of
> code to pick the right implementation. Which may be a good way to do
> that, just let's not call that a hook.
> 
> There's a commented-out MallocTuneHook() call, probably not needed.

Ok, I'll clean that up if we decide to proceed with this.

> 
> I wonder if #ifdefing is sufficient solution, because it happens at
> compile time, so what if someone overrides the allocator in LD_PRELOAD?
> That was a fairly common way to use a custom allocator in an existing
> application. But I don't know how many people do that with Postgres (I'm
> not aware of anyone doing that) or if we support that (it'd probably
> apply to other stuff too, not just malloc). So maybe it's OK, and I
> can't think of a better way anyway.

I couldn't think of a better way either, maybe there is something to be done 
with trying to dlsym something specific to glibc's malloc implementation ?


> 
> > The glibc's malloc implementation relies on a new GUC,
> > glibc_malloc_max_trim_threshold. When set to it's default value of -1, we
> > don't tune malloc at all, exactly as in HEAD. If a different value is
> > provided, we set M_MMAP_THRESHOLD to half this value, and M_TRIM_TRESHOLD
> > to this value, capped by work_mem / 2 and work_mem respectively.
> > 
> > The net result is that we can then allow to keep more unused memory at the
> > top of the heap, and to use mmap less frequently, if the DBA chooses too.
> > A possible other use case would be to on the contrary, limit the
> > allocated memory in idle backends to a minimum.
> > 
> > The reasoning behind this is that glibc's malloc default way of handling
> > those two thresholds is to adapt to the size of the last freed mmaped
> > block.
> > 
> > I've run the same "up to 32 columns" benchmark as you did, with this new
> > patch applied on top of both HEAD and your v2 patchset incorporating
> > planner estimates for the block sizez. Those are called "aset" and
> > "generation" in the attached spreadsheet. For each, I've run it with
> > glibc_malloc_max_trim_threshold set to -1, 1MB, 4MB and 64MB. In each case
> > 
> > I've measured two things:
> >   - query latency, as reported by pgbench
> >   - total memory allocated by malloc at backend ext after running each
> >   query
> > 
> > three times. This represents the "idle" memory consumption, and thus what
> > we waste in malloc inside of releasing back to the system. This
> > measurement has been performed using the very small module presented in
> > patch 0002. Please note that I in no way propose that we include this
> > module, it was just a convenient way for me to measure memory footprint.
> > 
> > My conclusion is that the impressive gains you see from using the
> > generation context with bigger blocks mostly comes from the fact that we
> > allocate bigger blocks, and that this moves the mmap thresholds
> > accordingly. I wonder how much of a difference it would make on other
> > malloc implementation: I'm afraid the optimisation presented here would
> > in fact be specific to glibc's malloc, since we have almost the same
> > gains with both allocators when tuning malloc to keep more memory. I
> > still think both approaches are useful, and would be necessary.
> Interesting measurements. It's intriguing that for generation contexts,
> the default "-1" often outperforms "1MB" (but not the other options),
> while for aset it's pretty much "the higher value the better".

For generation context with "big block sizes" this result is expected, as the 
malloc dynamic tuning will adapt to the big block size. This can also be seen 
on the "idle memory" measurement: the memory consumption is identical to the 
64MB value

Python Plain Text Sender

2022-01-07 Thread Ali Koca


to Dear Hackers,

I decided to we need basic e-mail sender in command line when coding 
somethings and asking one-line questions.


Thanks, Ali KocaFrom 21be4e168fae83bfcd67b219d6e4234a48f0f4f9 Mon Sep 17 00:00:00 2001
From: Ali Koca 
Date: Fri, 7 Jan 2022 15:28:12 +0300
Subject: [PATCH] sending plain text e-mail

---
 src/tools/plain_text_mail.py | 26 ++
 1 file changed, 26 insertions(+)
 create mode 100644 src/tools/plain_text_mail.py

diff --git a/src/tools/plain_text_mail.py b/src/tools/plain_text_mail.py
new file mode 100644
index 00..8b36871874
--- /dev/null
+++ b/src/tools/plain_text_mail.py
@@ -0,0 +1,26 @@
+"""-
+***
+*
+* plain_text_mail.py
+*   - This tool sends plain text emails.
+*
+* Created by: Ali Koca
+* Copyright (c) 2017-2021, PostgreSQL Global Development Group
+*
+***-"""
+
+import smtplib, ssl
+import os
+
+port = 465
+smtp_server = os.getenv("SMTP_SERVER")
+sender_email = os.getenv("EMAIL_ADDRESS")
+
+receiver_email = input("Enter receiver email: ")
+password = input("Type your password and press enter: ")
+message = input("Enter your message: ")
+
+context = ssl.create_default_context()
+with smtplib.SMTP_SSL(smtp_server, port, context=context) as server:
+server.login(sender_email, password)
+server.sendmail(sender_email, receiver_email, message)
\ No newline at end of file
-- 
2.32.0.windows.1

Fix vcregress plpython3 warning

2022-01-07 Thread Juan José Santamaría Flecha

Hi,

When running the plcheck in Windows we get the following warning, it is
visible in the cfbots [1]:

Use of uninitialized value $1 in concatenation (.) or string at
src/tools/msvc/vcregress.pl line 350.

This points to mangle_plpython3 subroutine. The attached patch addresses
the problem.

[1] http://commitfest.cputube.org/

Regards,

Juan José Santamaría Flecha


0001-Fix-vcregress-plpython3-warning.patch
Description: Binary data

Re: Use generation context to speed up tuplesorts

2022-01-07 Thread Tomas Vondra


On 1/7/22 12:03, Ronan Dunklau wrote:

Le vendredi 31 décembre 2021, 22:26:37 CET David Rowley a écrit :

I've attached some benchmark results that I took recently.  The
spreadsheet contains results from 3 versions. master, master + 0001 -
0002, then master + 0001 - 0003.  The 0003 patch makes the code a bit
more conservative about the chunk sizes it allocates and also tries to
allocate the tuple array according to the number of tuples we expect
to be able to sort in a single batch for when the sort is not
estimated to fit inside work_mem.


(Sorry for trying to merge back the discussion on the two sides of the thread)

In  https://www.postgresql.org/message-id/4776839.iZASKD2KPV%40aivenronan, I
expressed the idea of being able to tune glibc's malloc behaviour.

I implemented that (patch 0001) to provide a new hook which is called on
backend startup, and anytime we set work_mem. This hook is # defined depending
on the malloc implementation: currently a default, no-op implementation is
provided as well as a glibc's malloc implementation.



Not sure I'd call this a hook - that usually means a way to plug-in 
custom code through a callback, and this is simply ifdefing a block of 
code to pick the right implementation. Which may be a good way to do 
that, just let's not call that a hook.


There's a commented-out MallocTuneHook() call, probably not needed.

I wonder if #ifdefing is sufficient solution, because it happens at 
compile time, so what if someone overrides the allocator in LD_PRELOAD? 
That was a fairly common way to use a custom allocator in an existing 
application. But I don't know how many people do that with Postgres (I'm 
not aware of anyone doing that) or if we support that (it'd probably 
apply to other stuff too, not just malloc). So maybe it's OK, and I 
can't think of a better way anyway.



The glibc's malloc implementation relies on a new GUC,
glibc_malloc_max_trim_threshold. When set to it's default value of -1, we
don't tune malloc at all, exactly as in HEAD. If a different value is provided,
we set M_MMAP_THRESHOLD to half this value, and M_TRIM_TRESHOLD to this value,
capped by work_mem / 2 and work_mem respectively.

The net result is that we can then allow to keep more unused memory at the top
of the heap, and to use mmap less frequently, if the DBA chooses too. A
possible other use case would be to on the contrary, limit the allocated
memory in idle backends to a minimum.

The reasoning behind this is that glibc's malloc default way of handling those
two thresholds is to adapt to the size of the last freed mmaped block.

I've run the same "up to 32 columns" benchmark as you did, with this new patch
applied on top of both HEAD and your v2 patchset incorporating planner
estimates for the block sizez. Those are called "aset" and "generation" in the
attached spreadsheet. For each, I've run it with
glibc_malloc_max_trim_threshold set to -1, 1MB, 4MB and 64MB. In each case
I've measured two things:
  - query latency, as reported by pgbench
  - total memory allocated by malloc at backend ext after running each query
three times. This represents the "idle" memory consumption, and thus what we
waste in malloc inside of releasing back to the system. This measurement has
been performed using the very small module presented in patch 0002. Please
note that I in no way propose that we include this module, it was just a
convenient way for me to measure memory footprint.

My conclusion is that the impressive gains you see from using the generation
context with bigger blocks mostly comes from the fact that we allocate bigger
blocks, and that this moves the mmap thresholds accordingly. I wonder how much
of a difference it would make on other malloc implementation: I'm afraid the
optimisation presented here would in fact be specific to glibc's malloc, since
we have almost the same gains with both allocators when tuning malloc to keep
more memory. I still think both approaches are useful, and would be necessary.



Interesting measurements. It's intriguing that for generation contexts, 
the default "-1" often outperforms "1MB" (but not the other options), 
while for aset it's pretty much "the higher value the better".



Since this affects all memory allocations, I need to come up with other
meaningful scenarios to benchmarks.



OK. Are you thinking about a different microbenchmark, or something 
closer to real workload?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RFC: Logging plan of the running query

2022-01-07 Thread torikoshia


On 2022-01-07 14:30, torikoshia wrote:

Updated the patch for fixing compiler warning about the format on 
windows.


I got another compiler warning, updated the patch again.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 7 Jan 2022 19:38:29 +0900
Subject: [PATCH v16] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby

---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/storage/signalfuncs.h|  22 
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 19 files changed, 355 insertions(+), 63 deletions(-)
 create mode 100644 src/include/storage/signalfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e58efce586..9804574c10 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25430,6 +25430,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25543,6 +25563,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  plan of the query running on backend with PID 17793 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that nested statements (statements executed inside a function) are not
+considered for logging. Only the plan of the most deeply nested query is logged.
+   
+
   
 
   
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 3a4fa9091b..173e268be3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
 
 REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;
 
+REVOKE EXECUTE ON FUNCTION

Re: Column Filtering in Logical Replication

2022-01-07 Thread Peter Eisentraut


I think this is getting pretty good now.  I like the overall behavior now.

Some details:

There are still a few references to "filter", but I see most of the
patch now uses column list or something.  Maybe do another cleanup
pass before finalizing the patch.

doc/src/sgml/catalogs.sgml needs to be updated.

doc/src/sgml/ref/alter_publication.sgml:

"allows to change" -> "allows changing"

src/backend/catalog/pg_publication.c:

publication_translate_columns(): I find the style of having a couple
of output arguments plus a return value that is actually another
output value confusing.  (It would be different if the return value
was some kind of success value.)  Let's make it all output arguments.

About the XXX question there: I would make the column numbers always
sorted.  I don't have a strong reason for this, but otherwise we might
get version differences, unstable dumps etc.  It doesn't seem
complicated to keep this a bit cleaner.

I think publication_translate_columns() also needs to prohibit
generated columns.  We already exclude those implicitly throughout the
logical replication code, but if a user explicitly set one here,
things would probably break.

src/backend/commands/tablecmds.c:

ATExecReplicaIdentity(): Regarding the question of how to handle
REPLICA_IDENTITY_NOTHING: I see two ways to do this.  Right now, the
approach is that the user can set the replica identity freely, and we
decide later based on that what we can replicate (e.g., no updates).
For this patch, that would mean we don't restrict what columns can be
in the column list, but we check what actions we can replicate based
on the column list.  The alternative is that we require the column
list to include the replica identity, as the patch is currently doing,
which would mean that REPLICA_IDENTITY_NOTHING can be allowed since
it's essentially a set of zero columns.

I find the current behavior a bit weird on reflection.  If a user
wants to replicate on some columns and only INSERTs, that should be
allowed regardless of what the replica identity columns are.

src/backend/replication/pgoutput/pgoutput.c:

In get_rel_sync_entry(), why did you remove the block

-   if (entry->pubactions.pubinsert && 
entry->pubactions.pubupdate &&
-   entry->pubactions.pubdelete && 
entry->pubactions.pubtruncate)

-   break;

Maybe this is intentional, but it's not clear to me.

Re: \dP and \dX use ::regclass without "pg_catalog."

2022-01-07 Thread Michael Paquier

On Fri, Jan 07, 2022 at 06:30:30PM +0900, Tatsuro Yamada wrote:
> We should prefix them with pg_catalog as well.
> Are you planning to make a patch?
> If not, I'll make a patch later since that's where \dX is.

If any of you can make a patch, that would be great.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Logical replication timeout problem

2022-01-07 Thread Amit Kapila

On Wed, Dec 29, 2021 at 5:02 PM Fabrice Chapuis 
wrote:

> I put the instance with high level debug mode.
> I try to do some log interpretation: After having finished writing the
> modifications generated by the insert in the snap files,
> then these files are read (restored). One minute after this work starts,
> the worker process exit with an error code = 1.
> I see that keepalive messages were sent before the work process work leave.
>
> 2021-12-28 10:50:01.894 CET [55792] LOCATION:  WalSndKeepalive,
> walsender.c:3365
> ...
> 2021-12-28 10:50:31.854 CET [55792] STATEMENT:  START_REPLICATION SLOT
> "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
> '"pub008_s012a00"')
> 2021-12-28 10:50:31.907 CET [55792] DEBUG:  0: StartTransaction(1)
> name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
> 2021-12-28 10:50:31.907 CET [55792] LOCATION:  ShowTransactionStateRec,
> xact.c:5075
> 2021-12-28 10:50:31.907 CET [55792] STATEMENT:  START_REPLICATION SLOT
> "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
> '"pub008_s012a00"')
> 2021-12-28 10:50:31.907 CET [55792] DEBUG:  0: spill 2271 changes in
> XID 14312 to disk
> 2021-12-28 10:50:31.907 CET [55792] LOCATION:  ReorderBufferSerializeTXN,
> reorderbuffer.c:2245
> 2021-12-28 10:50:31.907 CET [55792] STATEMENT:  START_REPLICATION SLOT
> "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
> '"pub008_s012a00"')
> *2021-12-28 10:50:32.110 CET [55792] DEBUG:  0: restored 4096/22603999
> changes from disk*
> 2021-12-28 10:50:32.110 CET [55792] LOCATION:  ReorderBufferIterTXNNext,
> reorderbuffer.c:1156
> 2021-12-28 10:50:32.110 CET [55792] STATEMENT:  START_REPLICATION SLOT
> "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
> '"pub008_s012a00"')
> 2021-12-28 10:50:32.138 CET [55792] DEBUG:  0: restored 4096/22603999
> changes from disk
> ...
>
> *2021-12-28 10:50:35.341 CET [55794] DEBUG:  0: sending replication
> keepalive2021-12-28 10:50:35.341 CET [55794] LOCATION:  WalSndKeepalive,
> walsender.c:3365*
> ...
> *2021-12-28 10:51:31.995 CET [55791] ERROR:  XX000: terminating logical
> replication worker due to timeout*
>
> *2021-12-28 10:51:31.995 CET [55791] LOCATION:  LogicalRepApplyLoop,
> worker.c:1267*
>
>
It is still not clear to me why the problem happened? IIUC, after restoring
4096 changes from snap files, we send them to the subscriber, and then
apply worker should apply those one by one. Now, is it taking one minute to
restore 4096 changes due to which apply worker is timed out?

Could this function in* Apply main loop* in worker.c help to find a
> solution?
>
> rc = WaitLatchOrSocket(MyLatch,
> WL_SOCKET_READABLE | WL_LATCH_SET |
> WL_TIMEOUT | WL_POSTMASTER_DEATH,
> fd, wait_time,
> WAIT_EVENT_LOGICAL_APPLY_MAIN);
>

Can you explain why you think this will help in solving your current
problem?

--
With Regards,
Amit Kapila.

Re: \dP and \dX use ::regclass without "pg_catalog."

2022-01-07 Thread Tatsuro Yamada


Hi Justin,

On 2022/01/07 11:22, Justin Pryzby wrote:

But, we missed two casts to ::text which don't use pg_catalog.
Evidently the cast is to allow stable sorting.


Ah, you are right.

We should prefix them with pg_catalog as well.
Are you planning to make a patch?
If not, I'll make a patch later since that's where \dX is.

Regards,
Tatsuro Yamada

Re: row filtering for logical replication

2022-01-07 Thread Amit Kapila

On Fri, Jan 7, 2022 at 12:05 PM Amit Kapila  wrote:
>
> On Fri, Jan 7, 2022 at 9:44 AM Amit Kapila  wrote:
> >
> > On Thu, Jan 6, 2022 at 6:42 PM Euler Taveira  wrote:
> > >
> > > IMO we shouldn't reuse ReorderBufferChangeType. For a long-term solution, 
> > > it is
> > > fragile. ReorderBufferChangeType has values that do not matter for row 
> > > filter
> > > and it relies on the fact that REORDER_BUFFER_CHANGE_INSERT,
> > > REORDER_BUFFER_CHANGE_UPDATE and REORDER_BUFFER_CHANGE_DELETE are the 
> > > first 3
> > > values from the enum, otherwise, it breaks rfnodes and no_filters in
> > > pgoutput_row_filter().
> > >
> >
> > I think you mean to say it will break in pgoutput_row_filter_init(). I
> > see your point but OTOH, if we do what you are suggesting then don't
> > we need an additional mapping between ReorderBufferChangeType and
> > RowFilterPublishAction as row filter and pgoutput_change API need to
> > use those values.
> >
>
> Can't we use 0,1,2 as indexes for rfnodes/no_filters based on change
> type as they are local variables as that will avoid the fragileness
> you are worried about. I am slightly hesitant to introduce new enum
> when we are already using reorder buffer change type in pgoutput.c.
>

Euler, I have one more question about this patch for you. I see that
in the patch we are calling coerce_to_target_type() in
pgoutput_row_filter_init_expr() but do we really need the same? We
already do that via
transformPubWhereClauses->transformWhereClause->coerce_to_boolean
before storing where clause expression. It is not clear to me why that
is required? We might want to add a comment if that is required.

-- 
With Regards,
Amit Kapila.

Re: ICU for global collation

2022-01-07 Thread Julien Rouhaud

Hi,

I looked a bit more in this patch and I have some additional remarks.

On Thu, Dec 30, 2021 at 01:07:21PM +0100, Peter Eisentraut wrote:
> 
> So this is a different approach: If you choose ICU as the default locale for
> a database, you still need to specify lc_ctype and lc_collate settings, as
> before.  Unlike in the previous patch, where the ICU collation name was
> written in datcollate, there is now a third column (daticucoll), so we can
> store all three values.  This fixes the described problem.  Other than that,
> once you get all the initial settings right, it basically just works: The
> places that have ICU support now will use a database-wide ICU collation if
> appropriate, the places that don't have ICU support continue to use the
> global libc locale settings.

So just to confirm a database can now have 2 different *default* collations: a
libc-based one for everything that doesn't work with ICU and a ICU-based (if
specified) for everything else, and the ICU-based is optional, so if not
provided everything works as before, using the libc based default collation.

As I mentioned I think this approach is sensible.  However, should we document
what are the things that are not ICU-aware?

> I changed the datcollate, datctype, and the new daticucoll fields to type
> text (from name).  That way, the daticucoll field can be set to null if it's
> not applicable.  Also, the limit of 63 characters can actually be a problem
> if you want to use some combination of the options that ICU locales offer.
> And for less extreme uses, having variable-length fields will save some
> storage, since typical locale names are much shorter.

I understand the need to have daticucoll as text, however it's not clear to me
why this has to be changed for datcollate and datctype?  IIUC those will only
ever contain libc-based collation and are still mandatory?
> 
> For the same reasons and to keep things consistent, I also changed the
> analogous pg_collation fields like that.

The respective fields in pg_collation are now nullable, so the changes there
sounds ok.

Digging a bit more in the patch here are some things that looks problematic.

- pg_upgrade

It checks (in check_locale_and_encoding()) the compatibility for each database,
and it looks like the daticucoll field should also be verified.  Other than
that I don't think there is anything else needed for the pg_upgrade part as
everything else should be handled by pg_dump (I didn't look at the changes yet
given the below problems).

- CREATE DATABASE

There's a new COLLATION_PROVIDER option, but the overall behavior seems quite
unintuitive.  As far as I can see the idea is to use LOCALE for the ICU default
collation, but as it's providing a default for the other values it's quite
annoying.  For instance:

=# CREATE DATABASE db COLLATION_PROVIDER icu LOCALE 'fr-x-icu' LC_COLLATE 
'en_GB.UTF-8';;
ERROR:  42809: invalid locale name: "fr-x-icu"
LOCATION:  createdb, dbcommands.c:397

Looking at the code it's actually complaining about LC_CTYPE.  If you want a
database with an ICU default collation the lc_collate and lc_ctype should
inherit what's in the template database and not what was provided in  the
LOCALE I think.  You could still probably overload them in some scenario, but
without a list of what isn't ICU-aware I can't really be sure of how often one
might have to do it.

Now, if I specify everything as needed it looks like it's missing some checks
on the ICU default collation when not using template0:

=# CREATE DATABASE db COLLATION_PROVIDER icu LOCALE 'en-x-icu' LC_COLLATE 
'en_GB.UTF-8' LC_CTYPE 'en_GB.UTF-8';;
CREATE DATABASE

=# SELECT datname, datcollate, datctype, daticucoll FROM pg_database ;
  datname  | datcollate  |  datctype   | daticucoll
---+-+-+
 postgres  | en_GB.UTF-8 | en_GB.UTF-8 | fr-x-icu
 db| en_GB.UTF-8 | en_GB.UTF-8 | en-x-icu
 template1 | en_GB.UTF-8 | en_GB.UTF-8 | fr-x-icu
 template0 | en_GB.UTF-8 | en_GB.UTF-8 | fr-x-icu
(4 rows)

Unless I'm missing something the same concerns about collation incompatibility
with objects in the source database should also apply for the ICU collation?

While at it, I'm not exactly sure of what the COLLATION_PROVIDER is supposed to
mean, as the same commands but with a libc provider is accepted and has
the exact same result:

=# CREATE DATABASE db2 COLLATION_PROVIDER libc LOCALE 'en-x-icu' LC_COLLATE 
'en_GB.UTF-8' LC_CTYPE 'en_GB.UTF-8';;
CREATE DATABASE

=# SELECT datname, datcollate, datctype, daticucoll FROM pg_database ;
  datname  | datcollate  |  datctype   | daticucoll
---+-+-+
 postgres  | en_GB.UTF-8 | en_GB.UTF-8 | fr-x-icu
 db| en_GB.UTF-8 | en_GB.UTF-8 | en-x-icu
 template1 | en_GB.UTF-8 | en_GB.UTF-8 | fr-x-icu
 template0 | en_GB.UTF-8 | en_GB.UTF-8 | fr-x-icu
 db2   | en_GB.UTF-8 | en_GB.UTF-8 | en-x-icu
(5 rows)

Shouldn't db2 have a NULL daticucoll, and if so also complain about

Re: Index-only scan for btree_gist turns bpchar to char

2022-01-07 Thread Alexander Lakhin

Hello,
07.01.2022 09:26, Japin Li wrote:
> On Fri, 07 Jan 2022 at 03:21, Tom Lane  wrote:
>
> In any case, if we do need same() to implement the identical
> behavior to bpchareq(), then the other solution isn't sufficient
> either.
>
> So in short, it seems like we ought to do some compatibility testing
> and see if this code misbehaves at all with an index created by the
> old code.  I don't particularly want to do that ... any volunteers?
>
> Thanks for your patch, it looks good to me.  I'm not sure how to test this.
I will test it tomorrow.

Best regards,
Alexander

Re: In-placre persistance change of a relation

2022-01-07 Thread Kyotaro Horiguchi

At Thu, 06 Jan 2022 16:39:21 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Fantastic! I'll give it a try. Thanks!

I did that and found that the test stumbled on newlines.
Tests succeeded for other than Windows.

Windows version fails for a real known issue.

[7916][postmaster] LOG:  received immediate shutdown request
[7916][postmaster] LOG:  database system is shut down
[6228][postmaster] LOG:  starting PostgreSQL 15devel, compiled by Visual C++ 
build 1929, 64-bit
[6228][postmaster] LOG:  listening on Unix socket 
"C:/Users/ContainerAdministrator/AppData/Local/Temp/NcMnt2KTsr/.s.PGSQL.58698"
[2948][startup] LOG:  database system was interrupted; last known up at 
2022-01-07 07:12:14 GMT
[2948][startup] LOG:  database system was not properly shut down; automatic 
recovery in progress
[2948][startup] LOG:  redo starts at 0/1484280
[2948][startup] LOG:  invalid record length at 0/14A47B8: wanted 24, got 0
[2948][startup] FATAL:  could not remove file "base/12759/16384.u": Permission 
denied
[6228][postmaster] LOG:  startup process (PID 2948) exited with exit code 1


Mmm.. Someone is still grasping the file after restart?

Anyway, I post the fixed version.  This still fails on Windows..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 48527df0c7d094a8ca7cc8d0c90df02bfd7c2614 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v15 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c|  52 ++
 src/backend/access/transam/README |   8 +
 src/backend/access/transam/xact.c |   7 +
 src/backend/access/transam/xlog.c |  17 +
 src/backend/catalog/storage.c | 545 +-
 src/backend/commands/tablecmds.c  | 266 +++--
 src/backend/replication/basebackup.c  |   3 +-
 src/backend/storage/buffer/bufmgr.c   |  88 +++
 src/backend/storage/file/fd.c |   4 +-
 src/backend/storage/file/reinit.c | 344 +++
 src/backend/storage/smgr/md.c |  93 ++-
 src/backend/storage/smgr/smgr.c   |  32 +
 src/backend/storage/sync/sync.c   |  20 +-
 src/bin/pg_rewind/parsexlog.c |  24 +
 src/common/relpath.c  |  47 +-
 src/include/catalog/storage.h |   3 +
 src/include/catalog/storage_xlog.h|  42 +-
 src/include/common/relpath.h  |   9 +-
 src/include/storage/bufmgr.h  |   2 +
 src/include/storage/fd.h  |   1 +
 src/include/storage/md.h  |   8 +-
 src/include/storage/reinit.h  |  10 +-
 src/include/storage/smgr.h|  17 +
 src/test/recovery/t/027_persistence_change.pl | 247 
 24 files changed, 1707 insertions(+), 182 deletions(-)
 create mode 100644 src/test/recovery/t/027_persistence_change.pl

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 773d57..d251f22207 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+			default:
+action = "";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +98,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";

Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

2022-01-07 Thread Kyotaro Horiguchi

At Thu, 6 Jan 2022 23:55:01 -0800, SATYANARAYANA NARLAPURAM 
 wrote in 
> On Thu, Jan 6, 2022 at 11:24 PM Jeff Davis  wrote:
> 
> > On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> > > I would like to propose a GUC send_Wal_after_quorum_committed which
> > > when set to ON, walsenders corresponds to async standbys and logical
> > > replication workers wait until the LSN is quorum committed on the
> > > primary before sending it to the standby. This not only simplifies
> > > the post failover steps but avoids unnecessary downtime for the async
> > > replicas. Thoughts?
> >
> > Do we need a GUC? Or should we just always require that sync rep is
> > satisfied before sending to async replicas?
> >
> 
> I proposed a GUC to not introduce a behavior change by default. I have no
> strong opinion on having a GUC or making the proposed behavior default,
> would love to get others' perspectives as well.
> 
> 
> >
> > It feels like the sync quorum should always be ahead of the async
> > replicas. Unless I'm missing a use case, or there is some kind of
> > performance gotcha.
> >
> 
> I couldn't think of a case that can cause serious performance issues but
> will run some experiments on this and post the numbers.

I think Jeff is saying that "quorum commit" already by definition
means that all out-of-quorum standbys are behind of the
quorum-standbys.  I agree to that in a dictionary sense. But I can
think of the case where the response from the top-runner standby
vanishes or gets caught somewhere on network for some reason. In that
case the primary happily checks quorum ignoring the top-runner.

To avoid that misdecision, I can guess two possible "solutions".

One is to serialize WAL sending (of course it is unacceptable at all)
or aotehr is to send WAL to all standbys at once then make the
decision after making sure receiving replies from all standbys (this
is no longer quorum commit in another sense..)

So I'm afraid that there's no sensible solution to avoid the
hiding-forerunner problem on quorum commit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

77 matches

Mail list logo