Re: New docs chapter on Transaction Management and related changes

2022-11-29 Thread Simon Riggs
On Wed, 30 Nov 2022 at 01:51, Bruce Momjian  wrote:

> Thanks to Simon for getting this important
> information in our docs, and for the valuable feedback from others that
> made this even better.

And thanks to you for pulling that all together Bruce.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Slow standby snapshot

2022-11-29 Thread Simon Riggs
On Tue, 29 Nov 2022 at 20:46, Tom Lane  wrote:
>
> I wrote:
> > That seems like a fairly bad idea: it will add extra contention
> > on ProcArrayLock, and I see no real strong argument that the path
> > can't get traversed often enough for that to matter.  It would
> > likely be better for KnownAssignedXidsCompress to obtain the lock
> > for itself, only after it knows there is something worth doing.
>
> Since we're running out of time in the current commitfest,
> I went ahead and changed that, and made the cosmetic fixes
> I wanted, and pushed.

That is a complete patch from multiple angles; very happy here.

Thanks for a great job.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-29 Thread Simon Riggs
On Wed, 30 Nov 2022 at 03:50, Thomas Munro  wrote:
>
> On Wed, Nov 30, 2022 at 1:32 AM Simon Riggs
>  wrote:
> > Re-attaching patch for bgwriter and walwriter, so it is clear this is
> > not yet committed.
>
> I'm just curious, and not suggesting that 60s wakeups are a problem
> for the polar ice caps, but why even time out at all?  Are the latch
> protocols involved not reliable enough?  At a guess from a quick
> glance, the walwriter's is but maybe the bgwriter could miss a wakeup
> as it races against StrategyGetBuffer(), which means you might stay
> asleep until the *next* buffer allocation, but that's already true I
> think, and a 60s timeout is not much of a defence.

That sounds reasonable.

It does sound like we agree that the existing behavior of waking up
every 5s or 2.5s is not good. I hope you will act to improve that.

The approach taken in this patch, and others of mine, has been to
offer a minimal change that achieves the objective of lengthy
hibernation to save power.

Removing the timeout entirely may not work in other circumstances I
have not explored. Doing that requires someone to check it actually
works, and for others to believe that check has occurred. For me, that
is too time consuming to actually happen in this dev cycle, and time
is part of the objective since perfect designs yet with unreleased
code have no utility.



-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs
On Wed, 30 Nov 2022 at 03:56, Nathan Bossart  wrote:
>
> On Tue, Nov 29, 2022 at 12:02:44PM +, Simon Riggs wrote:
> > The last important point for me is tests, in src/test/modules
> > probably. It might be possible to reuse the final state of other
> > modules' tests to test cleanup, or at least integrate a custodian test
> > into each module.
>
> Of course.  I found some existing tests for the test_decoding plugin that
> appear to reliably generate the files we want the custodian to clean up, so
> I added them there.

Thanks for adding the tests; I can see they run clean.

The only minor thing I would personally add is a note in each piece of
code to explain where the tests are for each one and/or something in
the main custodian file that says tests exist within src/test/module.

Otherwise, ready for committer.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Bug in wait time when waiting on nested subtransaction

2022-11-29 Thread Simon Riggs
On Mon, 28 Nov 2022 at 21:10, Robert Haas  wrote:
>
> On Mon, Nov 28, 2022 at 3:01 PM Tom Lane  wrote:
> > One thing we need to be pretty careful of here is to not break the
> > promise of atomic commit.  At topmost commit, all subxacts must
> > appear committed simultaneously.  It's not quite clear to me whether
> > we need a similar guarantee in the rollback case.  It seems like
> > we shouldn't, but maybe I'm missing something, in which case maybe
> > the current behavior is correct?
>
> In short, I think Simon is right that there's a problem and right
> about which commit caused it, but I'm not sure what I think we ought
> to do about it.

I'm comfortable with ignoring it, on the basis that it *is* a
performance optimization, but I suggest we keep the test (with
modified output) and document the behavior, if we do.

The really big issue is the loss of performance we get from having
subtrans point only to its immediate parent, which makes
XidInMVCCSnapshot() go really slow in the presence of lots of
subtransactions. So ignoring the issue on this thread will open the
door for the optimization posted for this patch:
https://commitfest.postgresql.org/40/3806/

--
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-29 Thread Simon Riggs
On Thu, 17 Nov 2022 at 17:29, Simon Riggs  wrote:

> (yes, the last line shows x10 performance patched, that is not a typo)

New version of patch, now just a one-line patch!

Results show it's still a good win for XidInMVCCSnapshot().

-- 
Simon Riggshttp://www.EnterpriseDB.com/


subtrans_points_to_topxid.v13.patch
Description: Binary data


Re: Reducing power consumption on idle servers

2022-11-29 Thread Simon Riggs
On Sun, 20 Nov 2022 at 20:00, Simon Riggs  wrote:
>
> On Thu, 24 Mar 2022 at 16:21, Robert Haas  wrote:
> >
> > On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs
>
> > > What changes will be acceptable for bgwriter, walwriter and logical 
> > > worker?
> >
> > Hmm, I think it would be fine to introduce some kind of hibernation
> > mechanism for logical workers. bgwriter and wal writer already have a
> > hibernation mechanism, so I'm not sure what your concern is there
> > exactly. In your initial email you said you weren't proposing changes
> > there, but maybe that changed somewhere in the course of the
> > subsequent discussion. If you could catch me up on your present
> > thinking that would be helpful.
>
> Now that we seem to have solved the problem for Startup process, let's
> circle back to the others

Thanks for committing changes to the startup process.

> Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
> At default values that is 5s, but could be much less. So we need to
> move that up to 60s, same as others.
>
> WALwriter also hibernates currently, but only at 25x the
> wal_writer_delay. At default values that is 2.5s, but could be much
> less. So we need to move that up to 60s, same as others. At the same
> time, make sure that when we hibernate we use a new WaitEvent,
> similarly to the way bgwriter reports its hibernation state (which
> also helps test the patch).

Re-attaching patch for bgwriter and walwriter, so it is clear this is
not yet committed.

I've added this to the next CF, since the entry was closed when the
startup patch was committed.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_bgwriter_walwriter.v5.patch
Description: Binary data


Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-29 Thread Simon Riggs
On Mon, 28 Nov 2022 at 23:40, Nathan Bossart  wrote:
>
> Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
> from delaying shutdown.

That all seems good, thanks.

The last important point for me is tests, in src/test/modules
probably. It might be possible to reuse the final state of other
modules' tests to test cleanup, or at least integrate a custodian test
into each module.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-29 Thread Simon Riggs
On Mon, 28 Nov 2022 at 23:16, Thomas Munro  wrote:
>
> I found some more comments and some documentation to word-smith very
> lightly, and pushed.

Thanks

> The comments were stray references to the
> trigger file.  It's
> a little confusing because the remaining mechanism also uses a file,
> but it uses a signal first so seems better to refer to promotion
> requests rather than files.

..and again

> /me wonders how many idle servers there are in the world

My estimate is there are 100 million servers in use worldwide, with
only about 1% of them on a continuously busy duty cycle and most of
them not in the cloud.

If we guess that we save 10W when idle, then that saves some proportion of a GW.

It's not a huge contribution to the effort, but if by doing this we
inspire others to do the same, we may yet make a difference.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Simon Riggs
On Mon, 28 Nov 2022 at 17:38, Robert Haas  wrote:
>
> On Mon, Nov 28, 2022 at 10:28 AM Simon Riggs
>  wrote:
> > So we have these options
> >
> > 1. Removing the XactLockTableDelete() call in CommitSubTransaction().
> > That releases lock waiters earlier than expected, which requires
> > pushups in XactLockTableWait() to cope with that (which are currently
> > broken). Why do we do it? To save shmem space in the lock table should
> > anyone want to run a transaction that contains thousands of
> > subtransactions, or more. So option (1) alone would eventually cause
> > us to run out of space in the lock table and a transaction would
> > receive ERRORs rather than be allowed to run for a long time.
>
> This seems unprincipled to me. The point of having a lock on the
> subtransaction in the lock table is so that people can wait for the
> subtransaction to end. If we don't remove the lock table entry at
> subtransaction end, then that lock table entry doesn't serve that
> purpose any more.

An easy point to confuse:
"subtransaction to end": The subtransaction is "still running" to
other backends even AFTER it has been subcommitted, but its state now
transfers to the parent.

So the subtransaction doesn't cease running until it aborts, one of
its parent aborts or top level commit. The subxid lock should, on
principle, exist until one of those events occurs. It doesn't, but
that is an optimization, for the stated reason.

(All of the above is current behavior).

> > 2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
> > we go up the levels one by one as we did before. However, (2) causes
> > huge subtrans contention and if we implemented that and backpatched it
> > the performance issues could be significant. So my feeling is that if
> > we do (2) then we should not backpatch it.
>
> What I find suspicious about the coding of this function is that it
> doesn't distinguish between commits and aborts at all. Like, if a
> subtransaction commits, the changes don't become globally visible
> until the parent transaction also commits. If a subtransaction aborts,
> though, what happens to the top-level XID doesn't seem to matter at
> all. The comment seems to agree:
>
>  * Note that this does the right thing for subtransactions: if we wait on a
>  * subtransaction, we will exit as soon as it aborts or its top parent 
> commits.
>
> I feel like what I'd expect to see given this comment is code which
> (1) waits until the supplied XID is no longer running, (2) checks
> whether the XID aborted, and if so return at once, and (3) otherwise
> recurse to the parent XID. But the code doesn't do that. Perhaps
> that's not actually the right thing to do, since it seems like a big
> behavior change, but then I don't understand the comment.

As I mention above, the correct behavior is that the subxact doesn't
cease running until it aborts, one of its parent aborts or top level
commit.

Which is slightly different from the comment, which may explain why
the bug exists.

> Incidentally, one possible optimization here to try to release locking
> traffic would be to try waiting for the top-parent first using a
> conditional lock acquisition. If that works, cool. If not, go back
> around and work up the tree level by level. Since that path would only
> be taken in the unhappy case where we're probably going to have to
> wait anyway, the cost probably wouldn't be too bad.

That sounds like a potential bug fix (not just an optimization).

> > My preferred solution would be a mix of the above, call it option (3)
> >
> > 3.
> > a) Include the lock table entry for the first 64 subtransactions only,
> > so that we limit shmem. For those first 64 entries, have the subtrans
> > point direct to top, since this makes a subtrans lookup into an O(1)
> > operation, which is important for performance of later actions.
> >
> > b) For any subtransactions after first 64, delete the subxid lock on
> > subcommit, to save shmem, but make subtrans point to the immediate
> > parent (only), not the topxid. That fixes the bug, but causes
> > performance problems in XidInMVCCSnapshot() and others, so we also do
> > c) and d)
> >
> > c) At top level commit, go back and alter subtrans again for subxids
> > so now it points to the topxid, so that we avoid O(N) behavior in
> > XidInMVCCSnapshot() and other callers. Additional cost for longer
> > transactions, but it saves considerable cost in later callers that
> > need to call  GetTopmostTransaction.
> >
> > d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
> > page-at-a-time. This will reduce the contention of repeatedly
> > re-visit

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Simon Riggs
On Mon, 28 Nov 2022 at 18:53, Alvaro Herrera  wrote:
>
> On 2022-Nov-28, Simon Riggs wrote:
>
> > A narrative description of the issue follows:
> > session1 - requests multiple nested subtransactions like this:
> > BEGIN; ...
> > SAVEPOINT subxid1; ...
> > SAVEPOINT subxid2; ...
>
> > However, if subxid2 subcommits, then the lock wait moves from subxid2
> > to the topxid.
>
> Hmm, do we really do that?  Seems very strange .. it sounds to me like
> the lock should have been transferred to subxid1 (which is subxid2's
> parent), not to the top-level Xid.

Correct; that is exactly what I'm saying and why we have a bug since
3c27944fb2141.

> Maybe what the user wanted was to
> release subxid1 before establishing subxid2?  Or do they want to
> continue to be able to rollback to subxid1 after establishing subxid2?
> (but why?)

This isn't a description of a user's actions, it is a script that
illustrates the bug in XactLockTableWait().

Perhaps a better example would be nested code blocks with EXCEPTION
clauses where the outer block fails...
e.g.

DO $$
BEGIN
 SELECT 1;

 BEGIN
  SELECT 1;
  EXCEPTION WHEN OTHERS THEN
   RAISE NOTICE 's2';
  END;

  RAISE division_by_zero; -- now back in outer subxact, which now fails

 EXCEPTION WHEN OTHERS THEN
   RAISE NOTICE 's1';
END;$$;

Of course, debugging this is harder since there is no way to return
the current subxid in SQL.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Simon Riggs
Issue in current XactLockTableWait, starting with 3c27944fb2141,
discovered while reviewing https://commitfest.postgresql.org/40/3806/

Test demonstrating the problem is 001-isotest-tuplelock-subxact.v1.patch

A narrative description of the issue follows:
session1 - requests multiple nested subtransactions like this:
BEGIN; ...
SAVEPOINT subxid1; ...
SAVEPOINT subxid2; ...

If another session2 sees an xid from subxid2, it waits. If subxid2
aborts, then session2 sees the abort and can continue processing
normally.
However, if subxid2 subcommits, then the lock wait moves from subxid2
to the topxid. If subxid1 subsequently aborts, it will also abort
subxid2, but session2 waiting for subxid2 to complete doesn't see this
and waits for topxid instead. Which means that it waits longer than it
should, and later arriving lock waiters may actually get served first.

So it's a bug, but not too awful, since in many cases people don't use
nested subtransactions, and if they do use SAVEPOINTs, they don't
often close them using RELEASE. And in most cases the extra wait time
is not very long, hence why nobody ever reported this issue.

Thanks to Robert Haas and Julien Tachoires for asking the question
"are you sure the existing coding is correct?". You were both right;
it is not.

How to fix? Correct lock wait can be achieved while a subxid is
running if we do either
* a lock table entry for the subxid OR
* a subtrans entry that points to its immediate parent

So we have these options

1. Removing the XactLockTableDelete() call in CommitSubTransaction().
That releases lock waiters earlier than expected, which requires
pushups in XactLockTableWait() to cope with that (which are currently
broken). Why do we do it? To save shmem space in the lock table should
anyone want to run a transaction that contains thousands of
subtransactions, or more. So option (1) alone would eventually cause
us to run out of space in the lock table and a transaction would
receive ERRORs rather than be allowed to run for a long time.

2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
we go up the levels one by one as we did before. However, (2) causes
huge subtrans contention and if we implemented that and backpatched it
the performance issues could be significant. So my feeling is that if
we do (2) then we should not backpatch it.

So both (1) and (2) have issues.

The main result from patch https://commitfest.postgresql.org/40/3806/
is that having subtrans point direct to topxid is very good for
performance in XidIsInMVCCSnapshot(), and presumably other places
also. This bug prevents the  simple application of a patch to improve
performance. So now we need a stronger mix of ideas to both resolve
the bug and fix the subtrans contention issue in HEAD.

My preferred solution would be a mix of the above, call it option (3)

3.
a) Include the lock table entry for the first 64 subtransactions only,
so that we limit shmem. For those first 64 entries, have the subtrans
point direct to top, since this makes a subtrans lookup into an O(1)
operation, which is important for performance of later actions.

b) For any subtransactions after first 64, delete the subxid lock on
subcommit, to save shmem, but make subtrans point to the immediate
parent (only), not the topxid. That fixes the bug, but causes
performance problems in XidInMVCCSnapshot() and others, so we also do
c) and d)

c) At top level commit, go back and alter subtrans again for subxids
so now it points to the topxid, so that we avoid O(N) behavior in
XidInMVCCSnapshot() and other callers. Additional cost for longer
transactions, but it saves considerable cost in later callers that
need to call  GetTopmostTransaction.

d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
page-at-a-time. This will reduce the contention of repeatedly
re-visiting the same page(s) and ensure that a page is less often
paged out when we are still using it.

Thoughts?

--
Simon Riggshttp://www.EnterpriseDB.com/


001-isotest-tuplelock-subxact.v1.patch
Description: Binary data


Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Simon Riggs
On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  wrote:
>
> Thanks for taking a look!
>
> On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote:
> > * not sure I believe that everything it does can always be aborted out
> > of and shutdown - to achieve that you will need a
> > CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
>
> I did something like this earlier, but was advised to simply let the
> functions finish as usual during shutdown [0].  I think this is what the
> checkpointer process does today, anyway.

If we say "The custodian is not an essential process and can shutdown
quickly when requested.", and yet we know its not true in all cases,
then that will lead to misunderstandings and bugs.

If we perform a restart and the custodian is performing extra work
that delays shutdown, then it also delays restart. Given the title of
the thread, we should be looking to improve that, or at least know it
occurred.

> > * not sure why you want immediate execution of custodian tasks - I
> > feel supporting two modes will be a lot harder. For me, I would run
> > locally when !IsUnderPostmaster and also in an Assert build, so we can
> > test it works right - i.e. running in its own process is just a
> > production optimization for performance (which is the stated reason
> > for having this)
>
> I added this because 0004 involves requesting a task from the postmaster,
> so checking for IsUnderPostmaster doesn't work.  Those tasks would always
> run immediately.  However, we could use IsPostmasterEnvironment instead,
> which would allow us to remove the "immediate" argument.  I did it this way
> in v14.

Thanks

> > 0005 seems good from what I know
> > * There is no check to see if it worked in any sane time
>
> What did you have in mind?  Should the custodian begin emitting WARNINGs
> after a while?

I think it might be useful if it logged anything that took an
"extended period", TBD.

Maybe that is already covered by startup process logging. Please tell
me that still works?

> > Rather than explicitly use DEBUG1 everywhere I would have an
> > #define CUSTODIAN_LOG_LEVEL LOG
> > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > line change in a later phase of Beta
>
> I can create a separate patch for this, but I don't think I've ever seen
> this sort of thing before.

Much of recovery is coded that way, for the same reason.

> Is the idea just to help with debugging during
> the development phase?

"Just", yes. Tests would be desirable also, under src/test/modules.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-24 Thread Simon Riggs
On Fri, 21 Oct 2022 at 17:09, Maxim Orlov  wrote:

> Reviews and opinions are very welcome!

I'm wondering whether the safest way to handle this is by creating a
new TAM called "heap64", so that all storage changes happens there.
(Obviously there are still many other changes in core, but they are
more easily fixed).

That would reduce the code churn around "heap", allowing us to keep it
stable while we move to the brave new world.

Many current users see stability as one of the greatest strengths of
Postgres, so while I very much support this move, I wonder if this
gives us a way to have both stability and innovation at the same time?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-24 Thread Simon Riggs
On Thu, 24 Nov 2022 at 00:19, Nathan Bossart  wrote:
>
> On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote:
> > rebased
>
> another rebase for cfbot

0001 seems good to me
* I like that it sleeps forever until requested
* not sure I believe that everything it does can always be aborted out
of and shutdown - to achieve that you will need a
CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
* not sure why you want immediate execution of custodian tasks - I
feel supporting two modes will be a lot harder. For me, I would run
locally when !IsUnderPostmaster and also in an Assert build, so we can
test it works right - i.e. running in its own process is just a
production optimization for performance (which is the stated reason
for having this)

0005 seems good from what I know
* There is no check to see if it worked in any sane time
* It seems possible that "Old" might change meaning - will that make
it break/fail?

0006 seems good also
* same comments for 5

Rather than explicitly use DEBUG1 everywhere I would have an
#define CUSTODIAN_LOG_LEVEL LOG
so we can run with it in LOG mode and then set it to DEBUG1 with a one
line change in a later phase of Beta

I can't really comment with knowledge on sub-patches 0002 to 0004.

Perhaps you should aim to get 1, 5, 6 committed first and then return
to the others in a later CF/separate thread?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-24 Thread Simon Riggs
On Tue, 22 Nov 2022 at 18:44, Tom Lane  wrote:
>
> I wrote:
> > Still wondering if there's really no CHECK_FOR_INTERRUPT anywhere
> > else in this loop.
>
> I did some experimentation using the test case Jakub presented
> to start with, and verified that that loop does respond promptly
> to control-C even in HEAD.  So there are CFI(s) in the loop as
> I thought, and we don't need another.

Thanks for checking. Sorry for not responding earlier.

> What we do need is some more work on nearby comments.  I'll
> see about that and push it.

Thanks; nicely streamlined.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Hash index build performance tweak from sorting

2022-11-23 Thread Simon Riggs
On Wed, 23 Nov 2022 at 13:04, David Rowley  wrote:

> After getting rid of the  HashInsertState code and just adding bool
> sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch
> is much more simple:

Seems good to me and I wouldn't argue with any of your comments.

> and v4 includes 7 extra lines in hashinsert.c for the Assert() I
> mentioned in my previous email plus a bunch of extra comments.

Oh, I did already include that in v3 as requested.

> I'd rather see this solved like v4 is doing it.

Please do. No further comments. Thanks for your help

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-22 Thread Simon Riggs
On Tue, 22 Nov 2022 at 16:43, Justin Pryzby  wrote:
>
> On Mon, Nov 21, 2022 at 03:07:25PM +, Simon Riggs wrote:
> > Attached patch implements VACUUM (BACKGROUND).
> >
> > There are quite a few small details to consider; please read the docs
> > and comments.
> >
> > There is a noticeable delay before the background vacuum starts.
>
> You disallowed some combinations of unsupported options, but not others,
> like FULL, PARALLEL, etc.  They should either be supported or
> prohibited.
>
> +   /* use default values */
> +   tab.at_params.log_min_duration = 0;
>
> 0 isn't the default ?
>
> Maybe VERBOSE should mean to set min_duration=0, otherwise it should use
> the default ?

+1

> You only handle one rel, but ExecVacuum() has a loop around rels.
>
> +NOTICE:  autovacuum of "vactst" has been requested, using the options 
> specified
>
> => I don't think it's useful to say "using the options specified".
>
> Should autovacuum de-duplicate requests ?
> BRIN doesn't do that, but it's intended for append-only tables, so the
> issue doesn't really come up.

Easy to do

> Could add psql tab-completion.
>
> Is it going to be confusing that the session's GUC variables won't be
> transmitted to autovacuum ?  For example, the freeze and costing
> parameters.

I think we should start with the "how do I want it to behave" parts of
the above and leave spelling and tab completion as final items.

Other questions are whether there should be a limit on number of
background vacuums submitted at any time.
Whether there should be a GUC that specifies the max number of queued tasks.
Do we need a query that shows what items are queued?
etc

Justin, if you wanted to take up the patch from here, I would be more
than happy. You have the knowledge and insight to make this work
right.

We should probably start a new CF patch entry so we can return the
original patch as rejected, then continue with this new idea
separately.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Slow standby snapshot

2022-11-22 Thread Simon Riggs
On Tue, 22 Nov 2022 at 16:53, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Tue, 22 Nov 2022 at 16:28, Tom Lane  wrote:
> >> If we do those things, do we need a wasted-work counter at all?
>
> > The wasted work counter works well to respond to heavy read-only
> > traffic and also avoids wasted compressions for write-heavy workloads.
> > So I still like it the best.
>
> This argument presumes that maintenance of the counter is free,
> which it surely is not.  I don't know how bad contention on that
> atomically-updated variable could get, but it seems like it could
> be an issue when lots of processes are acquiring snapshots.

I understand. I was assuming that you and Andres liked that approach.

In the absence of that approach, falling back to a counter that
compresses every N xids would be best, in addition to the two new
forced compression events.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Slow standby snapshot

2022-11-22 Thread Simon Riggs
On Tue, 22 Nov 2022 at 16:28, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > We seem to have replaced one magic constant with another, so not sure
> > if this is autotuning, but I like it much better than what we had
> > before (i.e. better than my prev patch).
>
> Yeah, the magic constant is still magic, even if it looks like it's
> not terribly sensitive to the exact value.
>
> > 1. I was surprised that you removed the limits on size and just had
> > the wasted work limit. If there is no read traffic that will mean we
> > hardly ever compress, which means the removal of xids at commit will
> > get slower over time.  I would prefer that we forced compression on a
> > regular basis, such as every time we process an XLOG_RUNNING_XACTS
> > message (every 15s), as well as when we hit certain size limits.
>
> > 2. If there is lots of read traffic but no changes flowing, it would
> > also make sense to force compression when the startup process goes
> > idle rather than wait for the work to be wasted first.
>
> If we do those things, do we need a wasted-work counter at all?
>
> I still suspect that 90% of the problem is the max_connections
> dependency in the existing heuristic, because of the fact that
> you have to push max_connections to the moon before it becomes
> a measurable problem.  If we do
>
> -if (nelements < 4 * PROCARRAY_MAXPROCS ||
> -nelements < 2 * pArray->numKnownAssignedXids)
> +if (nelements < 2 * pArray->numKnownAssignedXids)
>
> and then add the forced compressions you suggest, where
> does that put us?

The forced compressions I propose happen
* when idle - since we have time to do it when that happens, which
happens often since most workloads are bursty
* every 15s - since we already have lock
which is overall much less often than every 64 commits, as benchmarked
by Michail.
I didn't mean to imply that superceded the wasted work approach, it
was meant to be in addition to.

The wasted work counter works well to respond to heavy read-only
traffic and also avoids wasted compressions for write-heavy workloads.
So I still like it the best.

> Also, if we add more forced compressions, it seems like we should have
> a short-circuit for a forced compression where there's nothing to do.
> So more or less like
>
> nelements = head - tail;
> if (!force)
> {
> if (nelements < 2 * pArray->numKnownAssignedXids)
> return;
> }
> else
> {
> if (nelements == pArray->numKnownAssignedXids)
> return;
> }

+1

> I'm also wondering why there's not an
>
> Assert(compress_index == pArray->numKnownAssignedXids);
>
> after the loop, to make sure our numKnownAssignedXids tracking
> is sane.

+1

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Simon Riggs
On Mon, 21 Nov 2022 at 20:55, Robert Haas  wrote:
>
> On Mon, Nov 21, 2022 at 1:17 PM Andres Freund  wrote:
> > On November 21, 2022 9:37:34 AM PST, Robert Haas  
> > wrote:
> > >On Mon, Nov 21, 2022 at 12:30 PM Andres Freund  wrote:
> > >> This can't quite be right - isn't this only applying the limit if we 
> > >> found a
> > >> visible tuple?
> > >
> > >It doesn't look that way to me, but perhaps I'm just too dense to see
> > >the problem?
> >
> > The earlier version didn't have the issue, but the latest seems to only 
> > limit after a visible tuple has been found. Note the continue; when 
> > fetching a heap tuple fails.
>
> Oh, that line was removed in Simon's patch but not in Jakub's version,
> I guess. Jakub's version also leaves out the last_block = block line
> which seems pretty critical.

New patch version reporting for duty, sir. Please take it from here!

-- 
Simon Riggshttp://www.EnterpriseDB.com/


0001-Damage-control-for-planner-s-get_actual_variable_end.v3.patch
Description: Binary data


Re: Code checks for App Devs, using new options for transaction behavior

2022-11-22 Thread Simon Riggs
On Mon, 7 Nov 2022 at 14:25, Simon Riggs  wrote:
>
> On Wed, 2 Nov 2022 at 07:40, Simon Riggs  wrote:
>
> > > What will be the behavior if someone declares a savepoint with this
> > > name ("_internal_nested_xact").  Will this interfere with this new
> > > functionality?
> >
> > Clearly! I guess you are saying we should disallow them.
> >
> > > Have we tested scenarios like that?
> >
> > No, but that can be done.
>
> More tests as requested, plus minor code rework, plus comment updates.

New versions

-- 
Simon Riggshttp://www.EnterpriseDB.com/


001_psql_parse_only.v1.patch
Description: Binary data


002_nested_xacts.v11.patch
Description: Binary data


003_rollback_on_commit.v2.patch
Description: Binary data


Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Simon Riggs
On Mon, 21 Nov 2022 at 22:15, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2022-11-21 16:17:56 -0500, Robert Haas wrote:
> >> But ... what if they're not? Could the index contain a large number of
> >> pages containing just 1 tuple each, or no tuples at all? If so, maybe
> >> we can read ten bazillion index pages trying to find each heap tuple
> >> and still end up in trouble.
>
> > ISTM that if you have an index in such a poor condition that a single
> > value lookup reads thousands of pages inside the index, planner
> > estimates taking long is going to be the smallest of your worries...
>
> Yeah, that sort of situation is going to make any operation on the
> index slow, not only get_actual_variable_endpoint().

That was also my conclusion: this is actually a common antipattern for
our indexes, not anything specific to the planner.

In another recent situation, I saw a very bad case of performance for
a "queue table". In that use case the rows are inserted at head and
removed from tail. Searching for the next item to be removed from the
queue involves an increasingly long tail search, in the case that a
long running transaction prevents us from marking the index entries
killed. Many tables exhibit such usage, for example, the neworder
table in TPC-C.

We optimized the case of frequent insertions into the rightmost index
page; now we also need to optimize the case of a long series of
deletions from the leftmost index pages. Not sure how, just framing
the problem.


--
Simon Riggshttp://www.EnterpriseDB.com/




Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-22 Thread Simon Riggs
On Mon, 21 Nov 2022 at 23:34, Robert Haas  wrote:
>
> On Mon, Nov 21, 2022 at 6:17 PM Tom Lane  wrote:
> > evidence that it's a live problem.  API warts are really hard to
> > get rid of once instituted.
>
> Yeah, agreed.

Agreed, happy not to; that version was just a WIP to see how it might work.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-21 Thread Simon Riggs
On Mon, 21 Nov 2022 at 18:17, Andres Freund  wrote:
>
> Hi,
>
> On November 21, 2022 9:37:34 AM PST, Robert Haas  
> wrote:
> >On Mon, Nov 21, 2022 at 12:30 PM Andres Freund  wrote:
> >> This can't quite be right - isn't this only applying the limit if we found 
> >> a
> >> visible tuple?
> >
> >It doesn't look that way to me, but perhaps I'm just too dense to see
> >the problem?
>
> The earlier version didn't have the issue, but the latest seems to only limit 
> after a visible tuple has been found. Note the continue; when fetching a heap 
> tuple fails.

Agreed, resolved in this version.


Robert, something like this perhaps? limit on both the index and the heap.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


0001-Damage-control-for-planner-s-get_actual_variable_end.v2.patch
Description: Binary data


Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-21 Thread Simon Riggs
On Mon, 21 Nov 2022 at 15:23, Robert Haas  wrote:
>
> On Mon, Nov 21, 2022 at 10:14 AM Simon Riggs
>  wrote:
> > > > What we need is a solution that avoids reading an unbounded number of
> > > > tuples under any circumstances. I previously suggested using
> > > > SnapshotAny here, but Tom didn't like that. I'm not sure if there are
> > > > safety issues there or if Tom was just concerned about the results
> > > > being misleading. Either way, maybe there's some variant on that theme
> > > > that could work. For instance, could we teach the index scan to stop
> > > > if the first 100 tuples that it finds are all invisible? Or to reach
> > > > at most 1 page, or at most 10 pages, or something?
> > >
> > > A hard limit on the number of index pages examined seems like it
> > > might be a good idea.
> >
> > Good, that is what the patch does.
>
> 
>
> Oh, that's surprisingly simple. Nice!
>
> Is there any reason to tie this into page costs? I'd be more inclined
> to just make it a hard limit on the number of pages. I think that
> would be more predictable and less prone to surprising (bad) behavior.
> And to be honest I would be inclined to make it quite a small number.
> Perhaps 5 or 10. Is there a good argument for going any higher?

+1, that makes the patch smaller and the behavior more predictable.

(Just didn't want to do anything too simple, in case it looked like a kluge.)

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Damage control for planner's get_actual_variable_endpoint() runaway

2022-11-21 Thread Simon Riggs
On Mon, 21 Nov 2022 at 15:01, Tom Lane  wrote:
>
> > What we need is a solution that avoids reading an unbounded number of
> > tuples under any circumstances. I previously suggested using
> > SnapshotAny here, but Tom didn't like that. I'm not sure if there are
> > safety issues there or if Tom was just concerned about the results
> > being misleading. Either way, maybe there's some variant on that theme
> > that could work. For instance, could we teach the index scan to stop
> > if the first 100 tuples that it finds are all invisible? Or to reach
> > at most 1 page, or at most 10 pages, or something?
>
> A hard limit on the number of index pages examined seems like it
> might be a good idea.

Good, that is what the patch does.

> > If we don't find a
> > match, we could either try to use a dead tuple, or we could just
> > return false which, I think, would end up using the value from
> > pg_statistic rather than any updated value.
>
> Yeah, the latter seems like the best bet.

Yes, just breaking out of the loop is enough to use the default value.

> If we do install such a thing, should we undo any of the previous
> changes that backed off the reliability of the result?

Not sure.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-21 Thread Simon Riggs
On Fri, 18 Nov 2022 at 11:54, Simon Riggs  wrote:
>
> On Thu, 17 Nov 2022 at 20:00, Justin Pryzby  wrote:
> >
> > On Wed, Nov 16, 2022 at 05:14:07PM -0500, Greg Stark wrote:
> > > I think this requesting autovacuum worker should be a distinct
> > > command. Or at least an explicit option to vacuum.
> >
> > +1.  I was going to suggest VACUUM (NOWAIT) ..
>
> Yes, I have no problem with an explicit command.
>
> At the moment the patch runs VACUUM in the background in an autovacuum
> process, but the call is asynchronous, since we do not wait for the
> command to finish (or even start).
>
> So the command names I was thinking of would be one of these:
>
> VACUUM (BACKGROUND) or VACUUM (AUTOVACUUM) - which might be clearer
> or
> VACUUM (ASYNC) - which is more descriptive of the behavior
>
> or we could go for both
> VACUUM (BACKGROUND, ASYNC) - since this allows us to have a
> BACKGROUND, SYNC version in the future


Attached patch implements VACUUM (BACKGROUND).

There are quite a few small details to consider; please read the docs
and comments.

There is a noticeable delay before the background vacuum starts.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


background_vacuum.v3.patch
Description: Binary data


Re: Allow single table VACUUM in transaction block

2022-11-21 Thread Simon Riggs
On Fri, 18 Nov 2022 at 18:26, Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Fri, Nov 18, 2022 at 7:04 AM Simon Riggs
> >  wrote:
> >> So if consistency is also a strong requirement, then maybe we should
> >> make that new command the default, i.e. make VACUUM always just a
> >> request to vacuum in background. That way it will be consistent.
>
> > Since one fairly common reason for running vacuum in the foreground is
> > needing to vacuum a table when all autovacuum workers are busy, or
> > when they are vacuuming it with a cost limit and it needs to get done
> > sooner, I think this would surprise a lot of users in a negative way.
>
> It would also break a bunch of our regression tests, which expect a
> VACUUM to complete immediately.
>
> >> Can we at least have a vacuum_runs_in_background = on | off, to allow
> >> users to take advantage of this WITHOUT needing to rewrite all of
> >> their scripts?
>
> > I'm not entirely convinced that's a good idea, but happy to hear what
> > others think.
>
> I think the answer to that one is flat no.  We learned long ago that GUCs
> with significant semantic impact on queries are a bad idea.  For example,
> if a user issues VACUUM expecting behavior A and she gets behavior B
> because somebody changed the postgresql.conf entry, she won't be happy.
>
> Basically, I am not buying Simon's requirement that this be transparent.
> I think the downsides would completely outweigh whatever upside there
> may be (and given the shortage of prior complaints, I don't think the
> upside is very large).

Just to say I'm happy with that decision and will switch to the
request for a background vacuum.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Patch: Global Unique Index

2022-11-21 Thread Simon Riggs
On Thu, 17 Nov 2022 at 22:01, Cary Huang  wrote:
>
> Patch: Global Unique Index

Let me start by expressing severe doubt on the usefulness of such a
feature, but also salute your efforts to contribute.

> In other words, a global unique index and a regular partitioned index are 
> essentially the same in terms of their storage structure except that one can 
> do cross-partition uniqueness check, the other cannot.

This is the only workable architecture, since it allows DETACH to be
feasible, which is essential.

You don't seem to mention that this would require a uniqueness check
on each partition. Is that correct? This would result in O(N) cost of
uniqueness checks, severely limiting load speed. I notice you don't
offer any benchmarks on load speed or the overhead associated with
this, which is not good if you want to be taken seriously, but at
least it is recoverable.

(It might be necessary to specify some partitions as READ ONLY, to
allow us to record their min/max values for the indexed cols, allowing
us to do this more quickly.)

> - Supported Features -
> 1. Global unique index is supported only on btree index type

Why? Surely any index type that supports uniqueness is good.

> - Not-supported Features -
> 1. Global uniqueness check with Sub partition tables is not yet supported as 
> we do not have immediate use case and it may involve majoy change in current 
> implementation

Hmm, sounds like a problem. Arranging the calls recursively should work.

> - Create a global unique index -
> To create a regular unique index on a partitioned table, Postgres has to 
> perform heap scan and sorting on every child partition. Uniqueness check 
> happens during the sorting phase and will raise an error if multiple tuples 
> with the same index key are sorted together. To achieve global uniqueness 
> check, we make Postgres perform the sorting after all of the child partitions 
> have been scanned instead of on the "sort per partition" fashion. In 
> otherwords, the sorting only happens once at the very end and it sorts the 
> tuples coming from all the partitions and therefore can ensure global 
> uniqueness.

My feeling is that performance on this will suck so badly that we must
warn people away from it, and tell people if they want this, create
the index at the start and let it load.

Hopefully CREATE INDEX CONCURRENTLY still works.

Let's see some benchmarks on this also please.

You'll need to think about progress reporting early because correctly
reporting the progress and expected run times are likely critical for
usability.

> Example:
>
> > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);
> > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);
> > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);
> > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL;
> > INSERT INTO gidx_part values(5, 5, 'test');
> > INSERT INTO gidx_part values(15, 5, 'test');
> ERROR:  duplicate key value violates unique constraint "gidx_part1_b_idx"
> DETAIL:  Key (b)=(5) already exists.

Well done.

> - DETACH -
> Since we retain the same partitioned structure, detaching a partition with 
> global unique index is straightforward. Upon DETACH, Postgres will change its 
> relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their 
> inheritance relationship as usual.

It's the only way that works

> - Optimizer, query planning and vacuum -
> Since no major modification is done on global unique index's structure and 
> storage, it works in the same way as a regular partitioned index. No major 
> change is required to be done on optimizer, planner and vacuum process as 
> they should work in the same way as regular index.

Agreed


Making a prototype is a great first step.

The next step is to understand the good and the bad aspects of it, so
you can see what else needs to be done. You need to be honest and real
about the fact that this may not actually be desirable in practice, or
in a restricted use case.

That means performance analysis of create, load, attach, detach,
INSERT, SELECT, UPD/DEL and anything else that might be affected,
together with algorithmic analysis of what happens for larger N and
larger tables.

Expect many versions; take provisions for many days.

Best of luck

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-21 Thread Simon Riggs
On Mon, 21 Nov 2022 at 08:40, Laurenz Albe  wrote:
>
> On Mon, 2022-11-21 at 07:36 +, Simon Riggs wrote:
> > On Mon, 21 Nov 2022 at 05:07, Laurenz Albe  wrote:
> > >
> > > On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > > > I'll wait 24 hours before committing, to
> > > > provide a last chance for anyone who wants to complain about dropping
> > > > promote_trigger_file.
> > >
> > > Remove "promote_trigger_file"?  Now I have never seen anybody use that
> > > parameter, but I don't think that it is a good idea to deviate from our
> > > usual standard of deprecating a feature for about five years before
> > > actually removing it.
> >
> > We aren't removing the ability to promote, just enforcing a change to
> > a better mechanism, hence I don't see a reason for a long(er)
> > deprecation period than we have already had.
>
> We have had a deprecation period?  I looked at the documentation, but found
> no mention of a deprecation.  How hard can it be to leave the GUC and only
> poll for the existence of the file if it is set?
>
> I personally don't need the GUC, and I know nobody who does,

Nobody else does either.

> I disagree.  With the same argument, you could rip out "serial", since we
> have supported identity columns since v11.

...and this is not a user facing change, only HA systems interface with this.

> but I think
> we should not be cavalier about introducing unnecessary compatibility breaks.

I agree we should not be cavalier, nor has anyone been so. The first
version of the patch was cautious on this very point, but since many
people think we should remove it, it is not a user facing feature and
nobody on this thread knows anybody or anything that uses it, I have
changed my mind and now think we should remove it.

We have two versions to choose from now
* v6 deprecates this option
* v10 removes this option
so it is no effort at all to choose either option.

The main issue is about reducing power usage, so please let's move to
commit something soon, one way or another.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-20 Thread Simon Riggs
On Sun, 20 Nov 2022 at 22:55, Nathan Bossart  wrote:
>
> On Mon, Nov 21, 2022 at 10:31:15AM +1300, Thomas Munro wrote:
> > On Mon, Nov 21, 2022 at 9:00 AM Simon Riggs
> >  wrote:
> >> As a 3rd patch, I will work on making logical workers hibernate.
> >
> > Duelling patch warning: Nathan mentioned[1] that he's hacking on a
> > patch for that, along the lines of the recent walreceiver change IIUC.
>
> I coded something up last week, but, like the walreceiver patch, it caused
> check-world to take much longer [0], and I haven't looked into whether it
> could be easily fixed.  I'm hoping to make some time for this again in the
> near future.
>
> [0] https://postgr.es/m/20221113222644.GA1269110%40nathanxps13

OK, Nathan, will leave this one to you - remembering that we need to
fix ALL processes to get a useful power reduction when idle.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-20 Thread Simon Riggs
On Mon, 21 Nov 2022 at 05:07, Laurenz Albe  wrote:
>
> On Mon, 2022-11-21 at 10:13 +1300, Thomas Munro wrote:
> > I'll wait 24 hours before committing, to
> > provide a last chance for anyone who wants to complain about dropping
> > promote_trigger_file.
>
> Remove "promote_trigger_file"?  Now I have never seen anybody use that
> parameter, but I don't think that it is a good idea to deviate from our
> usual standard of deprecating a feature for about five years before
> actually removing it.

We aren't removing the ability to promote, just enforcing a change to
a better mechanism, hence I don't see a reason for a long(er)
deprecation period than we have already had.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-20 Thread Simon Riggs
On Thu, 24 Mar 2022 at 16:21, Robert Haas  wrote:
>
> On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs

> > What changes will be acceptable for bgwriter, walwriter and logical worker?
>
> Hmm, I think it would be fine to introduce some kind of hibernation
> mechanism for logical workers. bgwriter and wal writer already have a
> hibernation mechanism, so I'm not sure what your concern is there
> exactly. In your initial email you said you weren't proposing changes
> there, but maybe that changed somewhere in the course of the
> subsequent discussion. If you could catch me up on your present
> thinking that would be helpful.

Now that we seem to have solved the problem for Startup process, let's
circle back to the others

Bgwriter does hibernate currently, but only at 50x the bgwriter_delay.
At default values that is 5s, but could be much less. So we need to
move that up to 60s, same as others.

WALwriter also hibernates currently, but only at 25x the
wal_writer_delay. At default values that is 2.5s, but could be much
less. So we need to move that up to 60s, same as others. At the same
time, make sure that when we hibernate we use a new WaitEvent,
similarly to the way bgwriter reports its hibernation state (which
also helps test the patch).


As a 3rd patch, I will work on making logical workers hibernate.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_bgwriter_walwriter.v5.patch
Description: Binary data


Re: Slow standby snapshot

2022-11-20 Thread Simon Riggs
On Sun, 20 Nov 2022 at 13:45, Michail Nikolaev
 wrote:

> If such approach looks committable - I could do more careful
> performance testing to find the best value for
> WASTED_SNAPSHOT_WORK_LIMIT_TO_COMPRESS.

Nice patch.

We seem to have replaced one magic constant with another, so not sure
if this is autotuning, but I like it much better than what we had
before (i.e. better than my prev patch).

Few thoughts

1. I was surprised that you removed the limits on size and just had
the wasted work limit. If there is no read traffic that will mean we
hardly ever compress, which means the removal of xids at commit will
get slower over time.  I would prefer that we forced compression on a
regular basis, such as every time we process an XLOG_RUNNING_XACTS
message (every 15s), as well as when we hit certain size limits.

2. If there is lots of read traffic but no changes flowing, it would
also make sense to force compression when the startup process goes
idle rather than wait for the work to be wasted first.

Quick patch to add those two compression events also.

That should favour the smaller wasted work limits.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


events_that_force_compression.v1.patch
Description: Binary data


Re: Reducing power consumption on idle servers

2022-11-20 Thread Simon Riggs
On Sat, 19 Nov 2022 at 10:59, Simon Riggs  wrote:

> New version attached.

Fix for doc xref

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v10.patch
Description: Binary data


Re: Reducing power consumption on idle servers

2022-11-19 Thread Simon Riggs
On Fri, 18 Nov 2022 at 20:26, Thomas Munro  wrote:
>
> On Sat, Nov 19, 2022 at 7:54 AM Simon Riggs
>  wrote:
> > I agree. I can't see a reason to keep it anymore.
>
> +Use of promote_trigger_file is deprecated. If you're
>
> I think 'deprecated' usually implies that it still works but you
> should avoid it.  I think you need something stronger.

Whisky? Or maybe just reword the sentence...

New version attached.

> > I'm nervous about not having any wakeup at all, but since we are
> > removing the parameter there is no other reason not to do as Andres
> > suggests.
>
> Why?  If we're accidentally relying on this timeout for recovery to
> not hang in some situation, that's a bug waiting to be discovered and
> fixed and it won't be this patch's fault.
>
> > New version attached, which assumes that the SIGALRMs are silenced on
> > the other thread.
>
> I tested this + Bharath's v5 from the other thread.  meson test
> passes, and tracing the recovery process shows that it is indeed,
> finally, completely idle.  Huzzah!

Thanks for testing!

Finally completely idle? Good to achieve that.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v9.patch
Description: Binary data


Re: Reducing power consumption on idle servers

2022-11-18 Thread Simon Riggs
On Thu, 17 Nov 2022 at 20:38, Robert Haas  wrote:
>
> On Thu, Nov 17, 2022 at 2:55 AM Simon Riggs
>  wrote:
> > No, it will have a direct effect only on people using promote_trigger_file
> > who do not read and act upon the deprecation notice before upgrading
> > by making a one line change to their failover scripts.
>
> TBH, I wonder if we shouldn't just nuke promote_trigger_file.
> pg_promote was added in 2018, and pg_ctl promote was added in 2011. So
> even if we haven't said promote_trigger_file was deprecated, it hasn't
> been the state-of-the-art way of doing this in a really long time. If
> we think that there are still a lot of people using it, or if popular
> tools are relying on it, then perhaps a deprecation period is
> warranted anyway. But I think we should at least consider the
> possibility that it's OK to just get rid of it right away.

I agree. I can't see a reason to keep it anymore.

I'm nervous about not having any wakeup at all, but since we are
removing the parameter there is no other reason not to do as Andres
suggests.

New version attached, which assumes that the SIGALRMs are silenced on
the other thread.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v8.patch
Description: Binary data


Re: Allow single table VACUUM in transaction block

2022-11-18 Thread Simon Riggs
On Thu, 17 Nov 2022 at 20:06, Robert Haas  wrote:
>
> On Wed, Nov 16, 2022 at 5:14 PM Greg Stark  wrote:
> > However I'm not a fan of commands that sometimes do one thing and
> > sometimes magically do something very different. I don't like the idea
> > that the same vacuum command would sometimes run in-process and
> > sometimes do this out of process request. And the rules for when it
> > does which are fairly complex to explain -- it runs in process unless
> > you're in a transaction when it runs out of process unless it's a
> > temporary table ...
>
> 100% agree.

I agree as well.

At the moment, the problem (OT) is that VACUUM behaves inconsistently

Outside a transaction - works perfectly
In a transaction - throws ERROR, which prevents a whole script from
executing correctly

What we are trying to do is avoid the ERROR. I don't want them to
behave like this, but that's the only option possible to avoid ERROR.

So if consistency is also a strong requirement, then maybe we should
make that new command the default, i.e. make VACUUM always just a
request to vacuum in background. That way it will be consistent.

Can we at least have a vacuum_runs_in_background = on | off, to allow
users to take advantage of this WITHOUT needing to rewrite all of
their scripts?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-18 Thread Simon Riggs
On Thu, 17 Nov 2022 at 20:00, Justin Pryzby  wrote:
>
> On Wed, Nov 16, 2022 at 05:14:07PM -0500, Greg Stark wrote:
> > I think this requesting autovacuum worker should be a distinct
> > command. Or at least an explicit option to vacuum.
>
> +1.  I was going to suggest VACUUM (NOWAIT) ..

Yes, I have no problem with an explicit command.

At the moment the patch runs VACUUM in the background in an autovacuum
process, but the call is asynchronous, since we do not wait for the
command to finish (or even start).

So the command names I was thinking of would be one of these:

VACUUM (BACKGROUND) or VACUUM (AUTOVACUUM) - which might be clearer
or
VACUUM (ASYNC) - which is more descriptive of the behavior

or we could go for both
VACUUM (BACKGROUND, ASYNC) - since this allows us to have a
BACKGROUND, SYNC version in the future

Thoughts?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-18 Thread Simon Riggs
On Fri, 18 Nov 2022 at 08:55, Bharath Rupireddy
 wrote:

> However, I'm a bit
> worried about how it'll affect the tools/providers/extensions that
> depend on it.

Who is that? Which ones depend upon it?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-18 Thread Simon Riggs
On Thu, 17 Nov 2022 at 20:29, Tomas Vondra
 wrote:
>
> On 11/17/22 18:29, Simon Riggs wrote:
> > On Thu, 17 Nov 2022 at 17:04, Simon Riggs  
> > wrote:
> >>
> > 003 includes the idea to not-always do SubTransSetParent()
> >
> I'm a bit confused by the TransactionIdsAreOnSameXactPage naming. Isn't
> this really checking clog pages?

Yes, clog page. I named it to match the new name of pg_xact

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-17 Thread Simon Riggs
On Wed, 16 Nov 2022 at 15:44, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Wed, 16 Nov 2022 at 00:09, Tom Lane  wrote:
> >> No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed
> >> is set (ie, somebody somewhere/somewhen overflowed), then it does
> >> SubTransGetTopmostTransaction and searches only the xips with the result.
> >> This behavior requires that all live subxids be correctly mapped by
> >> SubTransGetTopmostTransaction, or we'll draw false conclusions.
>
> > Your comments are correct wrt to the existing coding, but not to the
> > patch, which is coded as described and does not suffer those issues.
>
> Ah, OK.
>
> Still ... I really do not like this patch.  It introduces a number of
> extremely fragile assumptions, and I think those would come back to
> bite us someday, even if they hold now which I'm still unsure about.

Completely understand. It took me months to think this through.

> It doesn't help that you've chosen to document them only at the place
> making them and not at the place(s) likely to break them.

Yes, apologies for that, I focused on the holistic explanation in the slides.

> Also, to be blunt, this is not Ready For Committer.  It's more WIP,
> because even if the code is okay there are comments all over the system
> that you've invalidated.  (At the very least, the file header comments
> in subtrans.c and the comments in struct SnapshotData need work; I've
> not looked hard but I'm sure there are more places with comments
> bearing on these data structures.)

New version with greatly improved comments coming very soon.

> Perhaps it would be a good idea to split up the patch.  The business
> about making pg_subtrans flat rather than a tree seems like a good
> idea in any event, although as I said it doesn't seem like we've got
> a fleshed-out version of that here.  We could push forward on getting
> that done and then separately consider the rest of it.

Yes, I thought you might ask that so, after some thought, have found a
clean way to do that and have split this into two parts.

Julien has agreed to do further perf tests and is working on that now.

I will post new versions soon, earliest tomorrow.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-16 Thread Simon Riggs
On Thu, 17 Nov 2022 at 07:36, Bharath Rupireddy
 wrote:

> > promote_trigger_file is not tested and there are better ways, so
> > deprecating it in this release is fine.
>
> Hm, but..
>
> > Anyone that relies on it can update their mechanisms to a supported
> > one with a one-line change. Realistically, anyone using it won't be on
> > the latest release anyway, at least for a long time, since if they use
> > manual methods then they are well behind the times.
>
> I may be overly pessimistic here - the change from 5 sec to 60 sec for
> detecting promote_trigger_file will have a direct impact on failovers
> I believe.

No, it will have a direct effect only on people using promote_trigger_file
who do not read and act upon the deprecation notice before upgrading
by making a one line change to their failover scripts.

Since pretty much everyone doing HA uses external HA software (cloud
or otherwise) this shouldn't affect anyone.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-16 Thread Simon Riggs
On Wed, 16 Nov 2022 at 12:47, Bharath Rupireddy
 wrote:
>
> On Wed, Nov 16, 2022 at 2:34 PM Simon Riggs
>  wrote:
> >
> > Reposting v6 now so that patch tester doesn't think this has failed
> > when the patch on other thread gets applied.
>
> Intention of the patch, that is, to get rid of promote_trigger_file
> GUC sometime in future, looks good to me. However, the timeout change
> to 60 sec from 5 sec seems far-reaching. While it discourages the use
> of the GUC, it can impact many existing production servers that still
> rely on promote_trigger_file as it can increase the failover times
> impacting SLAs around failover.

The purpose of 60s is to allow for power reduction, so 5s won't do.

promote_trigger_file is not tested and there are better ways, so
deprecating it in this release is fine.

Anyone that relies on it can update their mechanisms to a supported
one with a one-line change. Realistically, anyone using it won't be on
the latest release anyway, at least for a long time, since if they use
manual methods then they are well behind the times.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-16 Thread Simon Riggs
On Sun, 13 Nov 2022 at 23:07, Simon Riggs  wrote:
>
> On Sun, 13 Nov 2022 at 21:28, Thomas Munro  wrote:
> >
> > On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
> >  wrote:
> > > The attached patch is a reduced version of the original. It covers only:
> > > * deprecation of the promote_trigger_file - there are no tests that
> > > use that, hence why there is no test coverage for the patch
> > > * changing the sleep time of the startup process to 60s
> > > * docs and comments
> >
> > LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
> > completely idle recovery process looks like:
> >
> > kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> > kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> > kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> >
> > Presumably it would have no timeout at all in the next release.
> >
> > [1] 
> > https://www.postgresql.org/message-id/calj2acuiyn+zmpguvmgeoy1u7ino2qsvqrnufk8swpvk3a8...@mail.gmail.com
>
> Clearly, I haven't been watching Hackers! Thanks for the nudge.
>
> See if this does the trick?

Looks like the SIGALRM wakeups will be silenced on the other thread,
so we can just revert back to using v6 of this patch.

Reposting v6 now so that patch tester doesn't think this has failed
when the patch on other thread gets applied.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v6.patch
Description: Binary data


Re: when the startup process doesn't (logging startup delays)

2022-11-16 Thread Simon Riggs
On Wed, 16 Nov 2022 at 06:47, Bharath Rupireddy
 wrote:
>
> On Tue, Nov 15, 2022 at 10:55 PM Robert Haas  wrote:
> >
> > On Tue, Nov 15, 2022 at 8:33 AM Bharath Rupireddy
> >  wrote:
> > > Please review the v2 patch.
> >
> > It seems to me that this will call disable_startup_progress_timeout
> > once per WAL record, which seems like an unnecessary expense. How
> > about leaving the code inside the loop just as we have it, and putting
> > if (StandbyMode) disable_startup_progress_timeout() before entering
> > the loop?
>
> That can be done, only if we can disable the timeout in another place
> when the StandbyMode is set to true in ReadRecord(), that is, after
> the standby server finishes crash recovery and enters standby mode.
>
> I'm attaching the v3 patch for further review. Please find the CF
> entry here - https://commitfest.postgresql.org/41/4012/.

begin_startup_progress_phase() checks to see if feature is disabled
twice, so I think you can skip the check and just rely on the check in
enable().

Otherwise, all good.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Hash index build performance tweak from sorting

2022-11-15 Thread Simon Riggs
n do this in
> the code, but I feel like we do it less often in newer code. e.g we do
> it in aset.c but not generation.c (which is much newer than aset.c).
> My personal preference would be just to name the struct
> HashInsertState and have no extra pointer typedefs.

Not done, but not disagreeing either, just not very comfortable
actually making those changes.

--
Simon Riggshttp://www.EnterpriseDB.com/


hash_inserted_sorted.v3.patch
Description: Binary data


Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-15 Thread Simon Riggs
On Wed, 16 Nov 2022 at 00:09, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Tue, 15 Nov 2022 at 21:03, Tom Lane  wrote:
> >> The subxidStatus.overflowed check quoted above has a similar sort
> >> of myopia: it's checking whether our current transaction has
> >> already suboverflowed.  But (a) that doesn't prove it won't suboverflow
> >> later, and (b) the relevant logic in XidInMVCCSnapshot needs to run
> >> SubTransGetTopmostTransaction if *any* proc in the snapshot has
> >> suboverflowed.
>
> > Not the way it is coded now.
>
> > First, we search the subxid cache in snapshot->subxip.
> > Then, and only if the snapshot overflowed (i.e. ANY xact overflowed),
> > do we check subtrans.
>
> No, that's not what XidInMVCCSnapshot does. If snapshot->suboverflowed
> is set (ie, somebody somewhere/somewhen overflowed), then it does
> SubTransGetTopmostTransaction and searches only the xips with the result.
> This behavior requires that all live subxids be correctly mapped by
> SubTransGetTopmostTransaction, or we'll draw false conclusions.

Your comments are correct wrt to the existing coding, but not to the
patch, which is coded as described and does not suffer those issues.


> We could perhaps make it do what you suggest,

Already done in the patch since v5.


> but that would require
> a complete performance analysis to make sure we're not giving up
> more than we would gain.

I agree that a full performance analysis is sensible and an objective
analysis has been performed by Julien Tachoires. This is described
here, along with other explanations:
https://docs.google.com/presentation/d/1A7Ar8_LM5EdC2OHL_j3U9J-QwjMiGw9mmXeBLJOmFlg/edit?usp=sharing

It is important to understand the context here: there is already a
well documented LOSS of performance with the current coding. The patch
alleviates that, and I have not been able to find a performance case
where there is any negative impact.

Further tests welcome.


> Also, both GetSnapshotData and CopySnapshot assume that the subxips
> array is not used if suboverflowed is set, and don't bother
> (continuing to) populate it.  So we would need code changes and
> additional cycles in those areas too.

Already done in the patch since v5.

Any additional cycles apply only to the case of snapshot overflow,
which currently performs very badly.


> I'm not sure about your other claims, but I'm pretty sure this one
> point is enough to kill the patch.

Then please look again because there are misunderstandings above.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Slow standby snapshot

2022-11-15 Thread Simon Riggs
On Wed, 16 Nov 2022 at 00:15, Tom Lane  wrote:
>
> Andres Freund  writes:
> > On 2022-11-15 23:14:42 +, Simon Riggs wrote:
> >> Hence more frequent compression is effective at reducing the overhead.
> >> But too frequent compression slows down the startup process, which
> >> can't then keep up.
> >> So we're just looking for an optimal frequency of compression for any
> >> given workload.
>
> > What about making the behaviour adaptive based on the amount of wasted 
> > effort
> > during those two operations, rather than just a hardcoded "emptiness" 
> > factor?
>
> Not quite sure how we could do that, given that those things aren't even
> happening in the same process.  But yeah, it does feel like the proposed
> approach is only going to be optimal over a small range of conditions.

I have not been able to think of a simple way to autotune it.

> > I don't think the xids % KAX_COMPRESS_FREQUENCY == 0 filter is a good idea -
> > if you have a workload with plenty subxids you might end up never 
> > compressing
> > because xids divisible by KAX_COMPRESS_FREQUENCY will end up as a subxid
> > most/all of the time.
>
> Yeah, I didn't think that was too safe either.

> It'd be more reliable
> to use a static counter to skip all but every N'th compress attempt
> (something we could do inside KnownAssignedXidsCompress itself, instead
> of adding warts at the call sites).

I was thinking exactly that myself, for the reason of keeping it all
inside KnownAssignedXidsCompress().

--
Simon Riggshttp://www.EnterpriseDB.com/




Re: Slow standby snapshot

2022-11-15 Thread Simon Riggs
On Tue, 15 Nov 2022 at 23:06, Tom Lane  wrote:
>
> BTW, while nosing around this code I came across this statement
> (procarray.c, about line 4550 in HEAD):
>
>  * ... To add XIDs to the array, we just insert
>  * them into slots to the right of the head pointer and then advance the head
>  * pointer.  This wouldn't require any lock at all, except that on machines
>  * with weak memory ordering we need to be careful that other processors
>  * see the array element changes before they see the head pointer change.
>  * We handle this by using a spinlock to protect reads and writes of the
>  * head/tail pointers.  (We could dispense with the spinlock if we were to
>  * create suitable memory access barrier primitives and use those instead.)
>  * The spinlock must be taken to read or write the head/tail pointers unless
>  * the caller holds ProcArrayLock exclusively.
>
> Nowadays we've *got* those primitives.  Can we get rid of
> known_assigned_xids_lck, and if so would it make a meaningful
> difference in this scenario?

I think you could do that *as well*, since it does act as an overhead
but that is not related to the main issues:

* COMMITs: xids are removed from the array by performing a binary
search - this gets more and more expensive as the array gets wider
* SNAPSHOTs: scanning the array for snapshots becomes more expensive
as the array gets wider

Hence more frequent compression is effective at reducing the overhead.
But too frequent compression slows down the startup process, which
can't then keep up.

So we're just looking for an optimal frequency of compression for any
given workload.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-15 Thread Simon Riggs
On Tue, 15 Nov 2022 at 21:03, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > New version attached.
>
> I looked at this patch and I do not see how it can possibly be safe.

I grant you it is complex, so please bear with me.


> The most direct counterexample arises from the fact that
> HeapCheckForSerializableConflictOut checks SubTransGetTopmostTransaction
> in some cases.  You haven't tried to analyze when, but just disabled
> the optimization in serializable mode:
>
> + * 2. When IsolationIsSerializable() we sometimes need to access 
> topxid.
> + *This occurs only when SERIALIZABLE is requested by app user.
> +...
> +if (MyProc->subxidStatus.overflowed ||
> +IsolationIsSerializable() ||
>
> However, what this is checking is whether *our current transaction*
> is serializable.  If we're not serializable, but other transactions
> in the system are, then this fails to store information that they'll
> need for correctness.  You'd have to have some way of knowing that
> no transaction in the system is using serializable mode, and that
> none will start to do so while this transaction is still in-doubt.

Not true.

src/backend/storage/lmgr/README-SSI   says this...

* Any transaction which is run at a transaction isolation level
other than SERIALIZABLE will not be affected by SSI.  If you want to
enforce business rules through SSI, all transactions should be run at
the SERIALIZABLE transaction isolation level, and that should
probably be set as the default.

If HeapCheckForSerializableConflictOut() cannot find a subxid's parent
then it will not be involved in serialization errors.

So skipping the storage of subxids in subtrans for non-serializable
xacts is valid for both SSI and non-SSI xacts.

Thus the owning transaction can decide to skip the insert into
subtrans if it is not serializable.

> I fear that's already enough to kill the idea; but there's more.
> The subxidStatus.overflowed check quoted above has a similar sort
> of myopia: it's checking whether our current transaction has
> already suboverflowed.  But (a) that doesn't prove it won't suboverflow
> later, and (b) the relevant logic in XidInMVCCSnapshot needs to run
> SubTransGetTopmostTransaction if *any* proc in the snapshot has
> suboverflowed.

Not the way it is coded now.

First, we search the subxid cache in snapshot->subxip.
Then, and only if the snapshot overflowed (i.e. ANY xact overflowed),
do we check subtrans.

Thus, the owning xact knows that anyone else will find the first 64
xids in the subxid cache, so it need not insert them into subtrans,
even if someone else overflowed. When the owning xact overflows, it
knows it must now insert the subxid into subtrans before the xid is
used anywhere in storage, which the patch does. This allows each
owning xact to decide what to do, independent of the actions of
others.

> Lastly, I don't see what the "transaction on same page" business
> has got to do with anything.  The comment is certainly failing
> to make the case that it's safe to skip filling subtrans when that
> is true.

That seems strange, I grant you. It's the same logic that is used in
TransactionIdSetTreeStatus(), in reverse. I understand it 'cos I wrote
it.

TRANSACTION_STATUS_SUB_COMMITTED is only ever used if the topxid and
subxid are on different pages. Therefore TransactionIdDidCommit()
won't ever see a value of TRANSACTION_STATUS_SUB_COMMITTED unless they
are on separate pages. So the owning transaction can predict in
advance whether anyone will ever call SubTransGetParent() for one of
its xids. If they might, then we record the values just in case. If
they NEVER will, then we can skip recording them.


And just to be clear, all 3 of the above preconditions must be true
before the owning xact decides to skip writing a subxid to subtrans.

> I think we could salvage this small idea:
>
> + * Insert entries into subtrans for this xid, noting that the 
> entry
> + * points directly to the topxid, not the immediate parent. This 
> is
> + * done for two reasons:
> + * (1) so it is faster in a long chain of subxids, because the
> + * algorithm is then O(1), no matter how many subxids are 
> assigned.
>
> but some work would be needed to update the comments around
> SubTransGetParent and SubTransGetTopmostTransaction to explain that
> they're no longer reliably different.  I think that that is okay for
> the existing use-cases, but they'd better be documented.  In fact,
> couldn't we simplify them down to one function?  Given the restriction
> that we don't look back in pg_subtrans further than TransactionXmin,
> I don't think that updated code would ever need to resolve cases
> written by older code.  So we could remove the recursive checks
> entirely, o

Re: when the startup process doesn't (logging startup delays)

2022-11-15 Thread Simon Riggs
On Tue, 15 Nov 2022 at 13:33, Bharath Rupireddy
 wrote:
>
> On Mon, Nov 14, 2022 at 9:31 PM Robert Haas  wrote:
> >
> > On Mon, Nov 14, 2022 at 7:37 AM Simon Riggs
> >  wrote:
> > > > Whilte at it, I noticed that we report redo progress for PITR, but we
> > > > don't report when standby enters archive recovery mode, say due to a
> > > > failure in the connection to primary or after the promote signal is
> > > > found. Isn't it useful to report in this case as well to know the
> > > > recovery progress?
> > >
> > > I think your patch disables progress too early, effectively turning
> > > off the standby progress feature. The purpose was to report on things
> > > that take long periods during recovery, not just prior to recovery.
> > >
> > > I would advocate that we disable progress only while waiting, as I've 
> > > done here:
> > > https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com
> >
> > Maybe I'm confused here, but I think that, on a standby, startup
> > progress messages are only printed until the main redo loop is
> > reached. Otherwise, we would print a message on a standby every 10s
> > forever, which seems like a thing that most users would not like. So I
> > think that Bharath has the right idea here.
>
> Yes, the idea is to disable the timeout on standby completely since we
> actually don't report any recovery progress. Keeping it enabled,
> unnecessarily calls startup_progress_timeout_handler() every
> log_startup_progress_interval seconds i.e. 10 seconds. That's the
> intention of the patch.

As long as we don't get the SIGALRMs that Thomas identified, then I'm happy.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: New docs chapter on Transaction Management and related changes

2022-11-15 Thread Simon Riggs
On Tue, 8 Nov 2022 at 03:41, Bruce Momjian  wrote:
>
> On Mon, Nov  7, 2022 at 10:58:05AM +, Simon Riggs wrote:
> > What I've posted is the merged patch, i.e. your latest patch, plus
> > changes to RELEASE SAVEPOINT from you on Oct 16, plus changes based on
> > the later comments from Robert and I.
>
> Thanks.  I have two changes to your patch.  First, I agree "destroy" is
> the wrong word for this, but I don't think "subcommit" is good, for
> three reasons:
>
> 1. Release merges the non-aborted changes into the previous transaction
> _and_ frees their resources --- "subcommit" doesn't have both meanings,
> which I think means if we need a single word, we should use "release"
> and later define what that means.
>
> 2. The "subcommit" concept doesn't closely match the user-visible
> behavior, even though we use subtransactions to accomplish this. Release
> is more of a rollup/merge into the previously-active
> transaction/savepoint.
>
> 3. "subcommit" is an implementation detail that I don't think we should
> expose to users in the manual pages.

I don't understand this - you seem to be presuming that "subcommit"
means something different and then objecting to that difference.

For me, "Subcommit" exactly matches what is happening because the code
comments and details already use Subcommit in exactly this way.

The main purpose of this patch is to talk about what is happening
using the same language as we do in the code. The gap between the code
and the docs isn't helping anyone.

> I adjusted the first paragraph of RELEASE SAVEPOINT to highlight the
> above issues.  My original patch had similar wording.
>
> The first attachment shows my changes to your patch, and the second
> attachment is my full patch.

OK, though this makes the patch tester look like this doesn't apply.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-15 Thread Simon Riggs
On Mon, 7 Nov 2022 at 21:14, Simon Riggs  wrote:

> These results are compelling, thank you.
>
> Setting this to Ready for Committer.

New version attached.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


002_minimize_calls_to_SubTransSetParent.v12.patch
Description: Binary data


Re: Allow single table VACUUM in transaction block

2022-11-15 Thread Simon Riggs
On Mon, 14 Nov 2022 at 19:52, Simon Riggs  wrote:
>
> On Tue, 8 Nov 2022 at 03:10, Simon Riggs  wrote:
> >
> > On Mon, 7 Nov 2022 at 08:20, Simon Riggs  
> > wrote:
> >
> > > Temp tables are actually easier, since we don't need any of the
> > > concurrency features we get with lazy vacuum.
>
> > Thoughts?
>
> New patch, which does this, when in a xact block
>
> 1. For temp tables, only VACUUM FULL is allowed
> 2. For persistent tables, an AV task is created to perform the vacuum,
> which eventually performs a vacuum
>
> The patch works, but there are various aspects of the design that need
> input. Thoughts?

New version.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


single_table_vacuum_in_xact.v4.patch
Description: Binary data


Re: Allow single table VACUUM in transaction block

2022-11-14 Thread Simon Riggs
On Tue, 8 Nov 2022 at 03:10, Simon Riggs  wrote:
>
> On Mon, 7 Nov 2022 at 08:20, Simon Riggs  wrote:
>
> > Temp tables are actually easier, since we don't need any of the
> > concurrency features we get with lazy vacuum.

> Thoughts?

New patch, which does this, when in a xact block

1. For temp tables, only VACUUM FULL is allowed
2. For persistent tables, an AV task is created to perform the vacuum,
which eventually performs a vacuum

The patch works, but there are various aspects of the design that need
input. Thoughts?

-- 
Simon Riggshttp://www.EnterpriseDB.com/


single_table_vacuum_in_xact.v3.patch
Description: Binary data


Re: when the startup process doesn't (logging startup delays)

2022-11-14 Thread Simon Riggs
On Tue, 8 Nov 2022 at 12:33, Bharath Rupireddy
 wrote:
>
> On Tue, Nov 8, 2022 at 4:35 PM Thomas Munro  wrote:
> >
> > On Sat, Oct 30, 2021 at 7:44 AM Robert Haas  wrote:
> > > Committed.
> >
> > Is it expected that an otherwise idle standby's recovery process
> > receives SIGALRM every N seconds, or should the timer be canceled at
> > that point, as there is no further progress to report?
>
> Nice catch. Yeah, that seems unnecessary, see the below standby logs.
> I think we need to disable_timeout(STARTUP_PROGRESS_TIMEOUT, false);,
> something like the attached? I think there'll be no issue with the
> patch since the StandbyMode gets reset only at the end of recovery (in
> FinishWalRecovery()) but it can very well be set during recovery (in
> ReadRecord()). Note that I also added an assertion in
> has_startup_progress_timeout_expired(), just in case.
>
> 2022-11-08 11:28:23.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:23.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
> 2022-11-08 11:28:33.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:33.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
> 2022-11-08 11:28:43.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:43.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
> 2022-11-08 11:28:53.563 UTC [980909] LOG:  SIGALRM handle_sig_alarm received
> 2022-11-08 11:28:53.563 UTC [980909] LOG:
> startup_progress_timeout_handler called
>
> Whilte at it, I noticed that we report redo progress for PITR, but we
> don't report when standby enters archive recovery mode, say due to a
> failure in the connection to primary or after the promote signal is
> found. Isn't it useful to report in this case as well to know the
> recovery progress?

I think your patch disables progress too early, effectively turning
off the standby progress feature. The purpose was to report on things
that take long periods during recovery, not just prior to recovery.

I would advocate that we disable progress only while waiting, as I've done here:
https://www.postgresql.org/message-id/CANbhV-GcWjZ2cmj0uCbZDWQUHnneMi_4EfY3dVWq0-yD5o7Ccg%40mail.gmail.com

--
Simon Riggshttp://www.EnterpriseDB.com/




Re: Reducing power consumption on idle servers

2022-11-13 Thread Simon Riggs
On Sun, 13 Nov 2022 at 21:28, Thomas Munro  wrote:
>
> On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
>  wrote:
> > The attached patch is a reduced version of the original. It covers only:
> > * deprecation of the promote_trigger_file - there are no tests that
> > use that, hence why there is no test coverage for the patch
> > * changing the sleep time of the startup process to 60s
> > * docs and comments
>
> LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
> completely idle recovery process looks like:
>
> kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
>
> Presumably it would have no timeout at all in the next release.
>
> [1] 
> https://www.postgresql.org/message-id/calj2acuiyn+zmpguvmgeoy1u7ino2qsvqrnufk8swpvk3a8...@mail.gmail.com

Clearly, I haven't been watching Hackers! Thanks for the nudge.

See if this does the trick?

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v7.patch
Description: Binary data


Re: Reducing power consumption on idle servers

2022-11-13 Thread Simon Riggs
On Thu, 24 Mar 2022 at 16:02, Simon Riggs  wrote:
>
> On Thu, 24 Mar 2022 at 15:39, Robert Haas  wrote:
> >
> > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
> >  wrote:
> > > The proposals of this patch are the following, each of which can be
> > > independently accepted/rejected:
> > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> > > (directly affects powersave)
> > > 2. deprecate promote_trigger_file, which then allows us to fix the
> > > sleep for startup process (directly affects powersave)
> > > 3. treat hibernation in all procs the same, for simplicity, and to
> > > make sure we don't forget one later
> > > 4. provide a design pattern for background worker extensions to
> > > follow, so as to encourage powersaving
> >
> > Unfortunately, the patch isn't split in a way that corresponds to this
> > division. I think I'm +1 on change #2 -- deprecating
> > promote_trigger_file seems like a good idea to me independently of any
> > power-saving considerations. But I'm not sure that I am on board with
> > any of the other changes.
>
> OK, so change (2) is good. Separate patch attached.

Thanks to Ian for prompting me to pick up this thread again; apologies
for getting distracted.

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

Other points will be discussed in another branch of this thread.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v6.patch
Description: Binary data


Re: New docs chapter on Transaction Management and related changes

2022-11-10 Thread Simon Riggs
On Mon, 7 Nov 2022 at 22:04, Laurenz Albe  wrote:
>
> On Sat, 2022-11-05 at 10:08 +, Simon Riggs wrote:
> > Agreed; new compilation patch attached, including mine and then
> > Robert's suggested rewordings.
>
> Thanks.  There is clearly a lot of usefule information in this.
>
> Some comments:
>
> > --- a/doc/src/sgml/func.sgml
> > +++ b/doc/src/sgml/func.sgml
> > @@ -24673,7 +24673,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
> > 
> >  Returns the current transaction's ID.  It will assign a new one if 
> > the
> >  current transaction does not have one already (because it has not
> > -performed any database updates).
> > +performed any database updates);  see  > +linkend="transaction-id"/> for details.  If executed in a
> > +subtransaction this will return the top-level xid;  see  > +linkend="subxacts"/> for details.
> > 
> >
>
> I would use a comma after "subtransaction", and I think it would be better to 
> write
> "transaction ID" instead of "xid".
>
> > @@ -24690,6 +24693,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
> >  ID is assigned yet.  (It's best to use this variant if the 
> > transaction
> >  might otherwise be read-only, to avoid unnecessary consumption of 
> > an
> >  XID.)
> > +If executed in a subtransaction this will return the top-level xid.
> > 
> >
>
> Same as above.
>
> > @@ -24733,6 +24737,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
> > 
> >  Returns a current snapshot, a data structure
> >  showing which transaction IDs are now in-progress.
> > +Only top-level xids are included in the snapshot; subxids are not
> > +shown;  see  for details.
> > 
> >
>
> Again, I would avoid "xid" and "subxid", or at least use "transaction ID 
> (xid)"
> and similar.
>
> > --- a/doc/src/sgml/ref/release_savepoint.sgml
> > +++ b/doc/src/sgml/ref/release_savepoint.sgml
> > @@ -34,23 +34,16 @@ RELEASE [ SAVEPOINT ] 
> > savepoint_name
> >Description
> >
> >
> > -   RELEASE SAVEPOINT destroys a savepoint previously 
> > defined
> > -   in the current transaction.
> > +   RELEASE SAVEPOINT will subcommit the subtransaction
> > +   established by the named savepoint, if one exists. This will release
> > +   any resources held by the subtransaction. If there were any
> > +   subtransactions of the named savepoint, these will also be subcommitted.
> >
> >
> >
>
> "Subtransactions of the named savepoint" is somewhat confusing; how about
> "subtransactions of the subtransaction established by the named savepoint"?
>
> If that is too long and explicit, perhaps "subtransactions of that 
> subtransaction".
>
> > @@ -78,7 +71,7 @@ RELEASE [ SAVEPOINT ] 
> > savepoint_name
> >
> >
> > It is not possible to release a savepoint when the transaction is in
> > -   an aborted state.
> > +   an aborted state, to do that use .
> >
> >
> >
>
> I think the following is more English:
> "It is not possible ... state; to do that, use ."
>
> > --- a/doc/src/sgml/ref/rollback.sgml
> > +++ b/doc/src/sgml/ref/rollback.sgml
> > @@ -56,11 +56,14 @@ ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> >  AND CHAIN
> >  
> >   
> > -  If AND CHAIN is specified, a new transaction is
> > +  If AND CHAIN is specified, a new unaborted 
> > transaction is
> >immediately started with the same transaction characteristics (see 
> >  >linkend="sql-set-transaction"/>) as the just finished one.  
> > Otherwise,
> >no new transaction is started.
>
> I don't think that is an improvement.  "Unaborted" is an un-word.  A new 
> transaction
> is always "unaborted", isn't it?
>
> > --- a/doc/src/sgml/wal.sgml
> > +++ b/doc/src/sgml/wal.sgml
> > @@ -909,4 +910,36 @@
> > seem to be a problem in practice.
> >
> >   
> > +
> > + 
> > +
> > +  Two-Phase Transactions
> > +
> > +  
> > +   PostgreSQL supports a two-phase commit (2PC)
> [...]
> > +   pg_twophase directory. Currently-prepared
> > +   transactions can be inspected using  > +   
>

Re: Allow single table VACUUM in transaction block

2022-11-07 Thread Simon Riggs
On Mon, 7 Nov 2022 at 08:20, Simon Riggs  wrote:

> Temp tables are actually easier, since we don't need any of the
> concurrency features we get with lazy vacuum. So the answer is to
> always run a VACUUM FULL on temp tables since this skips any issues
> with indexes etc..

So I see 3 options for what to do next

1. Force the FULL option for all tables, when executed in a
transaction block. This gets round the reasonable objections to
running a concurrent vacuum in a shared xact block. As Justin points
out, CLUSTER is already supported, which uses the same code.

2. Force the FULL option for temp tables, when executed in a
transaction block. In a later patch, queue up an autovacuum run for
regular tables.

3. Return with feedback this patch. (But then what happens with temp tables?)

Thoughts?

--
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-11-07 Thread Simon Riggs
On Tue, 1 Nov 2022 at 08:55, Julien Tachoires  wrote:
>
> > > 4. Test results with transaction 1
> > >
> > > TPS vs number of sub-transaction
> > >
> > > nsubx  HEAD  patched
> > > 
> > >   0   441109  439474
> > >   8   439045  438103
> > >  16   439123  436993
> > >  24   436269  434194
> > >  32   439707  437429
> > >  40   439997  437220
> > >  48   439388  437422
> > >  56   439409  437210
> > >  64   439748  437366
> > >  7292869  434448
> > >  8066577  434100
> > >  8861243  434255
> > >  9657016  434419
> > > 10452132  434917
> > > 11249181  433755
> > > 12046581  434044
> > > 12844067  434268
> > >
> > > Perf profiling on HEAD with 80 sub-transactions:
> > > Overhead  Symbol
> > >   51.26%  [.] LWLockAttemptLock
> > >   24.59%  [.] LWLockRelease
> > >0.36%  [.] base_yyparse
> > >0.35%  [.] PinBuffer
> > >0.34%  [.] AllocSetAlloc
> > >0.33%  [.] hash_search_with_hash_value
> > >0.22%  [.] LWLockAcquire
> > >0.20%  [.] UnpinBuffer
> > >0.15%  [.] SimpleLruReadPage_ReadOnly
> > >0.15%  [.] _bt_compare
> > >
> > > Perf profiling on patched with 80 sub-transactions:
> > > Overhead  Symbol
> > >   2.64%  [.] AllocSetAlloc
> > >   2.09%  [.] base_yyparse
> > >   1.76%  [.] hash_search_with_hash_value
> > >   1.62%  [.] LWLockAttemptLock
> > >   1.26%  [.] MemoryContextAllocZeroAligned
> > >   0.93%  [.] _bt_compare
> > >   0.92%  [.] expression_tree_walker_impl.part.4
> > >   0.84%  [.] SearchCatCache1
> > >   0.79%  [.] palloc
> > >   0.64%  [.] core_yylex
> > >
> > > 5. Test results with transaction 2
> > >
> > > nsubx  HEAD  patched
> > > 
> > >   0  440145  443816
> > >   8  438867  443081
> > >  16  438634  441786
> > >  24  436406  440187
> > >  32  439203  442447
> > >  40  439819  443574
> > >  48  439314  442941
> > >  56  439801  443736
> > >  64  439074  441970
> > >  72  439833  444132
> > >  80  148737  439941
> > >  88  413714  443343
> > >  96  251098  442021
> > > 104   70190  443488
> > > 112  405507  438866
> > > 120  177827  443202
> > > 128  399431  441842
> > >
> > > From the performance point of view, this patch clearly fixes the
> > > dramatic TPS collapse shown in these tests.
> >
> > I think these are really promising results.  Although the perf result
> > shows that the bottleneck on the SLRU is no more there with the patch,
> > I think it would be nice to see the wait event as well.
>
> Please find attached samples returned by the following query when
> testing transaction 1 with 80 subxacts:
> SELECT wait_event_type, wait_event, locktype, mode, database,
> relation, COUNT(*) from pg_stat_activity AS psa JOIN pg_locks AS pl ON
> (psa.pid = pl.pid) GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY 7 DESC;

These results are compelling, thank you.

Setting this to Ready for Committer.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Code checks for App Devs, using new options for transaction behavior

2022-11-07 Thread Simon Riggs
On Wed, 2 Nov 2022 at 07:40, Simon Riggs  wrote:

> > What will be the behavior if someone declares a savepoint with this
> > name ("_internal_nested_xact").  Will this interfere with this new
> > functionality?
>
> Clearly! I guess you are saying we should disallow them.
>
> > Have we tested scenarios like that?
>
> No, but that can be done.

More tests as requested, plus minor code rework, plus comment updates.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


002_nested_xacts.v10.patch
Description: Binary data


001_psql_parse_only.v1.patch
Description: Binary data


003_rollback_on_commit.v1.patch
Description: Binary data


Re: Error for WITH options on partitioned tables

2022-11-07 Thread Simon Riggs
On Mon, 7 Nov 2022 at 08:55, Karina Litskevich
 wrote:
>
> Hi David,
>
> > I am not very clear about why `build_reloptions` is removed in patch
> > `v2-0002-better-error-message-for-setting-parameters-for-p.patch`, if
> > you can help explain would be great.
>
> "build_reloptions" parses "reloptions" and takes for it a list of allowed
> options defined by the 5th argument "relopt_elems" and the 6th argument
> "num_relopt_elems", which are NULL and 0 in the removed call. If "validate"
> is false, it ignores options, which are not in the list, while parsing. If
> "validate" is true, it "elog"s ERROR when it meets option, which is not in the
> allowed list.

Karina's changes make sense to me, so +1.

This is a minor patch, so I will set this as Ready For Committer.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: New docs chapter on Transaction Management and related changes

2022-11-07 Thread Simon Riggs
On Mon, 7 Nov 2022 at 07:43, Bruce Momjian  wrote:
>
> On Fri, Nov  4, 2022 at 04:17:28PM +0100, Laurenz Albe wrote:
> > On Sat, 2022-10-15 at 21:08 -0400, Bruce Momjian wrote:
> > > I therefore merged all three paragraphs into
> > > one and tried to make the text saner;  release_savepoint.sgml diff
> > > attached, URL content updated.
> >
> > I wanted to have a look at this, but I am confused.  The original patch
> > was much bigger.  Is this just an incremental patch?  If yes, it would
> > be nice to have a "grand total" patch, so that I can read it all
> > in one go.
>
> Yeah, I said:
>
> Yes, I didn't like global either --- I like your wording.  I added 
> your
> other changes too, with slight rewording.  Merged patch to be posted 
> in
>-
> a later email.
>
> but that was unclear.  Let me post one now, and Simon posted one too.

What I've posted is the merged patch, i.e. your latest patch, plus
changes to RELEASE SAVEPOINT from you on Oct 16, plus changes based on
the later comments from Robert and I.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-07 Thread Simon Riggs
On Sun, 6 Nov 2022 at 20:40, Peter Geoghegan  wrote:
>
> On Sun, Nov 6, 2022 at 11:14 AM Tom Lane  wrote:
> > In general, I do not believe in encouraging users to run VACUUM
> > manually in the first place.  We would be far better served by
> > spending our effort to improve autovacuum's shortcomings.
>
> I couldn't agree more. A lot of problems seem related to the idea that
> VACUUM is just a command that the DBA periodically runs to get a
> predictable fixed result, a little like CREATE INDEX. That conceptual
> model isn't exactly wrong; it just makes it much harder to apply any
> kind of context about the needs of the table over time. There is a
> natural cycle to how VACUUM (really autovacuum) is run, and the
> details matter.
>
> There is a significant amount of relevant context that we can't really
> use right now. That wouldn't be true if VACUUM only ran within an
> autovacuum worker (by definition). The VACUUM command itself would
> still be available, and support the same user interface, more or less.
> Under the hood the VACUUM command would work by enqueueing a VACUUM
> job, to be performed asynchronously by an autovacuum worker. Perhaps
> the initial enqueue operation could be transactional, fixing Simon's 
> complaint.

Ah, I see you got to this idea first!

Yes, what we need is for the "VACUUM command" to not fail in a script.
Not sure anyone cares where the work takes place.

Enqueuing a request for autovacuum to do that work, then blocking
until it is complete would do the job.

> "No more VACUUMs outside of autovacuum" would enable more advanced
> autovacuum.c scheduling, allowing us to apply a lot more context about
> the costs and benefits, without having to treat manual VACUUM as an
> independent thing. We could coalesce together redundant VACUUM jobs,
> suspend and resume VACUUM operations, and have more strategies to deal
> with problems as they emerge.

+1, but clearly this would not make temp table VACUUMs work.

> > I'd like to see some sort of direct attack on its inability to deal
> > with temp tables, for instance.  (Force the owning backend to
> > do it?  Temporarily change the access rules so that the data
> > moves to shared buffers?  Dunno, but we sure haven't tried hard.)

This was a $DIRECT attack on making temp tables work! ;-)

Temp tables are actually easier, since we don't need any of the
concurrency features we get with lazy vacuum. So the answer is to
always run a VACUUM FULL on temp tables since this skips any issues
with indexes etc..

We would need to check a few things first maybe something like
this (mostly borrowed heavily from COPY)

InvalidateCatalogSnapshot();
if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
ereport(WARNING,
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 errmsg("vacuum of temporary table ignored because
of prior transaction activity")));
   CheckTableNotInUse(rel, "VACUUM");

> This is a good example of the kind of thing I have in mind. Perhaps it
> could work by killing the backend that owns the temp relation when
> things truly get out of hand? I think that that would be a perfectly
> reasonable trade-off.

+1

> Another related idea: better behavior in the event of a manually
> issued VACUUM (now just an enqueued autovacuum) that cannot do useful
> work due to the presence of a long running snapshot. The VACUUM
> doesn't have to dutifully report "success" when there is no practical
> sense in which it was successful. There could be a back and forth
> conversation between autovacuum.c and vacuumlazy.c that makes sure
> that something useful happens sooner or later. The passage of time
> really matters here.

Regrettably, neither vacuum nor autovacuum waits for xmin to change;
perhaps it should.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-06 Thread Simon Riggs
On Sun, 6 Nov 2022 at 18:50, Peter Geoghegan  wrote:
>
> On Thu, Oct 27, 2022 at 2:31 AM Simon Riggs
>  wrote:
> > Fix, so that this works without issue:
> >
> > BEGIN;
> > 
> > VACUUM (ANALYZE) vactst;
> > 
> > COMMIT;
> >
> > Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.
> >
> > When in a xact block, we do not set PROC_IN_VACUUM,
> > nor update datfrozenxid.
>
> It doesn't seem like a good idea to add various new special cases to
> VACUUM just to make scripts like this work.

Usability is a major concern that doesn't get a high enough priority.

> I'm pretty sure that there
> are several deep, subtle reasons why VACUUM cannot be assumed safe to
> run in a user transaction.

I expected there were, so it's good to discuss them. Thanks for the input.

> If we absolutely have to do this, then the least worst approach might
> be to make VACUUM into a no-op rather than throwing an ERROR -- demote
> the ERROR into a WARNING. You could argue that we're just arbitrarily
> deciding to not do a VACUUM just to be able to avoid throwing an error
> if we do that. But isn't that already true with the patch that we
> have? Is it really a "true VACUUM" if the operation can never advance
> datfrozenxid? At least selectively demoting the ERROR to a WARNING is
> "transparent" about it.

I'll answer that part in my reply to Tom, since there are good ideas in both.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: New docs chapter on Transaction Management and related changes

2022-11-05 Thread Simon Riggs
On Fri, 4 Nov 2022 at 15:17, Laurenz Albe  wrote:
>
> On Sat, 2022-10-15 at 21:08 -0400, Bruce Momjian wrote:
> > I therefore merged all three paragraphs into
> > one and tried to make the text saner;  release_savepoint.sgml diff
> > attached, URL content updated.
>
> I wanted to have a look at this, but I am confused.  The original patch
> was much bigger.  Is this just an incremental patch?  If yes, it would
> be nice to have a "grand total" patch, so that I can read it all
> in one go.

Agreed; new compilation patch attached, including mine and then
Robert's suggested rewordings.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


xact_docs.v9.patch
Description: Binary data


Re: Allow single table VACUUM in transaction block

2022-11-04 Thread Simon Riggs
Hi Rahila,

Thanks for your review.

On Fri, 4 Nov 2022 at 07:37, Rahila Syed  wrote:

>> I would like to bring up a few points that I came across while looking into 
>> the vacuum code.
>>
>> 1.  As a result of this change to allow VACUUM inside a user transaction, I 
>> think there is some chance of causing
>> a block/delay of concurrent VACUUMs if a VACUUM is being run under a long 
>> running transaction.
>> 2. Also, if a user runs VACUUM in a transaction, performance optimizations 
>> like PROC_IN_VACUUM won't work.
>> 3. Also, if VACUUM happens towards the end of a long running transaction, 
>> the snapshot will be old
>> and xmin horizon for vacuum would be somewhat old as compared to current 
>> lazy vacuum which
>> acquires a new snapshot just before scanning the table.
>>
>> So, while I understand the need of the feature, I am wondering if there 
>> should be some mention
>> of above caveats in documentation with the recommendation that VACUUM should 
>> be run outside
>> a transaction, in general.
>>
>
> Sorry, I just noticed that you have already mentioned some of these in the 
> documentation as follows, so it seems
> it is already taken care of.
>
> +VACUUM cannot be executed inside a transaction block,
> +unless a single table is specified and FULL is not
> +specified.  When executing inside a transaction block the vacuum scan can
> +hold back the xmin horizon and does not update the database datfrozenxid,
> +as a result this usage is not useful for database maintenance, but is 
> provided
> +to allow vacuuming in special circumstances, such as temporary or private
> +work tables.

Yes, I wondered whether we should have a NOTICE or WARNING to remind
people of those points?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-03 Thread Simon Riggs
On Tue, 1 Nov 2022 at 23:56, Simon Riggs  wrote:

> > I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
> > within a user txn.
>
> My intention was to prevent that. I am certainly quite uneasy about
> changing anything related to CLUSTER/VF, since they are old, complex
> and bug prone.
>
> So for now, I will block VF, as was my original intent.
>
> I will be guided by what others think... so you may yet get your wish.
>
>
> > Maybe the error message needs to be qualified "...when multiple
> > relations are specified".
> >
> > ERROR:  VACUUM cannot run inside a transaction block
>
> Hmm, that is standard wording based on the statement type, but I can
> set a CONTEXT message also. Will update accordingly.
>
> Thanks for your input.

New version attached, as described.

Other review comments and alternate opinions welcome.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


single_table_vacuum.v2.patch
Description: Binary data


Re: Code checks for App Devs, using new options for transaction behavior

2022-11-02 Thread Simon Riggs
On Wed, 2 Nov 2022 at 03:52, Dilip Kumar  wrote:
>
> On Mon, Oct 31, 2022 at 6:54 PM Simon Riggs
>  wrote:
> >
> > > > What is the behavior if "nested_transactions" value is changed within
> > > > a transaction execution, suppose the value was on and we have created
> > > > a few levels of nested subtransactions and within the same transaction
> > > > I switched it to off or to outer?
>
> I think you missed the above comment?

[copy of earlier reply]

Patch does the same dance as with other xact variables.

XactNesting is the value within the transaction and in the patch this
is not exported, so cannot be set externally.

XactNesting is set at transaction start to the variable
DefaultXactNesting, which is set by the GUC.

So its not a problem, but thanks for checking.

> > > I am not sure whether it is good to not allow PREPARE or we can just
> > > prepare it and come out of the complete nested transaction.  Suppose
> > > we have multiple savepoints and we say prepare then it will just
> > > succeed so why does it have to be different here?
> >
> > I'm happy to discuss what the behavior should be in this case. It is
> > not a common case,
> > and people don't put PREPARE in their scripts except maybe in a test.
> >
> > My reasoning for this code is that we don't want to accept a COMMIT
> > until we reach top-level of nesting,
> > so the behavior should be similar for PREPARE, which is just
> > first-half of final commit.
>
> Yeah this is not a very common case.  And we can see opinions from
> others as well.  But I think your reasoning for doing it this way also
> makes sense to me.
>
> I have some more comments for 0002
> 1.
> +if (XactNesting == XACT_NEST_OUTER && XactNestingLevel > 0)
> +{
> +/* Throw ERROR */
> +ereport(ERROR,
> +(errmsg("nested ROLLBACK, level %u aborts
> outer transaction", XactNestingLevel--)));
> +}
>
> I did not understand in case of 'outer' if we are giving rollback from
> inner nesting level why it is throwing error?  Documentation just says
> this[1] but it did not
> mention the error.  I agree that we might need to give the rollback as
> many times as the nesting level but giving errors seems confusing to
> me.

Docs mention ROLLBACK at any level will abort the transaction, which
is what the ERROR does.

> [1]
> +   
> +A setting of outer will cause a nested
> +BEGIN to be remembered, so that an equal number
> +of COMMIT or ROLLBACK commands
> +are required to end the nesting. In that case a
> ROLLBACK
> +at any level will abort the entire outer transaction.
> +Once we reach the top-level transaction,
> +the final COMMIT will end the transaction.
> +This ensures that all commands within the outer transaction are 
> atomic.
> +   
>
>
> 2.
>
> +if (XactNesting == XACT_NEST_OUTER)
> +{
> +if (XactNestingLevel <= 0)
> +s->blockState = TBLOCK_END;
> +else
> +ereport(NOTICE,
> +(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
> + errmsg("nested COMMIT, level %u",
> XactNestingLevel)));
> +XactNestingLevel--;
> +return true;
> +}

This is decrementing the nesting level for XACT_NEST_OUTER,
until we reach the top level, when the commit is allowed.

> +while (s->parent != NULL && !found_subxact)
>  {
> +if (XactNesting == XACT_NEST_ALL &&
> +XactNestingLevel > 0 &&
> +PointerIsValid(s->name) &&
> +strcmp(s->name, NESTED_XACT_NAME) == 0)
> +found_subxact = true;
> +
>  if (s->blockState == TBLOCK_SUBINPROGRESS)
>  s->blockState = TBLOCK_SUBCOMMIT
>
> I think these changes should be explained in the comments.

This locates the correct subxact by name, as you mention in (4)

> 3.
>
> +if (XactNesting == XACT_NEST_OUTER)
> +{
> +if (XactNestingLevel > 0)
> +{
> +ereport(NOTICE,
> +(errmsg("nested COMMIT, level %u in
> aborted transaction", XactNestingLevel)));
> +XactNestingLevel--;
> +return false;
> +}
> +}
>
> Better to writ

Re: Error for WITH options on partitioned tables

2022-11-01 Thread Simon Riggs
Apologies, I only just noticed these messages. I have set WoA until I
have read, understood and can respond to your detailed input, thanks.

On Fri, 28 Oct 2022 at 22:21, David Zhang  wrote:
>
> Hi Karina,
>
> I am not very clear about why `build_reloptions` is removed in patch
> `v2-0002-better-error-message-for-setting-parameters-for-p.patch`, if
> you can help explain would be great.
>
>  From my observation, it seems the WITH option has different behavior
> when creating partitioned table and index. For example,
>
> pgbench -i --partitions=2 --partition-method=range -d postgres
>
> postgres=# create index idx_bid on pgbench_accounts using btree(bid)
> with (fillfactor = 90);
> CREATE INDEX
>
> postgres=# select relname, relkind, reloptions from pg_class where
> relnamespace=2200 order by oid;
>relname   | relkind |reloptions
> +-+--
>   idx_bid| I   | {fillfactor=90}
>   pgbench_accounts_1_bid_idx | i   | {fillfactor=90}
>   pgbench_accounts_2_bid_idx | i   | {fillfactor=90}
>
> I can see the `fillfactor` parameter has been added to the indexes,
> however, if I try to alter `fillfactor`, I got an error like below.
> postgres=# alter index idx_bid set (fillfactor=40);
> ERROR:  ALTER action SET cannot be performed on relation "idx_bid"
> DETAIL:  This operation is not supported for partitioned indexes.
>
> This generic error message is reported by
> `errdetail_relkind_not_supported`, and there is a similar error message
> for partitioned tables. Anyone knows if this can be an option for
> reporting this `fillfactor` parameter not supported error.
>
>
> Best regards,
>
> David
>
> On 2022-10-14 8:16 a.m., Karina Litskevich wrote:
> > Hi, Simon!
> >
> > The new error message looks better. But I believe it is better to use
> > "parameters" instead of "options" as it is called "storage parameters"
> > in the documentation. I also believe it is better to report error in
> > partitioned_table_reloptions() (thanks to Japin Li for mentioning this
> > function) as it also fixes the error message in the following situation:
> >
> > test=# CREATE TABLE parted_col_comment (a int, b text) PARTITION BY
> > LIST (a);
> > CREATE TABLE
> > test=# ALTER TABLE parted_col_comment SET (fillfactor=100);
> > ERROR:  unrecognized parameter "fillfactor"
> >
> > I attached the new versions of patches.
> >
> > I'm not sure about the errcode. May be it is better to report error with
> > ERRCODE_INVALID_OBJECT_DEFINITION for CREATE TABLE and with
> > ERRCODE_WRONG_OBJECT_TYPE for ALTER TABLE (as when you try "ALTER TABLE
> > partitioned INHERIT nonpartitioned;" an ERROR with
> > ERRCODE_WRONG_OBJECT_TYPE
> > is reported). Then either the desired code should be passed to
> > partitioned_table_reloptions() or similar checks and ereport calls
> > should be
> > placed in two different places. I'm not sure whether it is a good idea to
> > change the code in one of these ways just to change the error code.
> >
> > Best regards,
> > Karina Litskevich
> > Postgres Professional: http://postgrespro.com/
>


-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Allow single table VACUUM in transaction block

2022-11-01 Thread Simon Riggs
On Thu, 27 Oct 2022 at 21:07, Justin Pryzby  wrote:
>
> On Thu, Oct 27, 2022 at 10:31:31AM +0100, Simon Riggs wrote:
> > Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.
>
> Maybe I misunderstood what you meant: you said "not VACUUM FULL", but
> with your patch, that works:
>
> postgres=# begin; VACUUM FULL pg_class; commit;
> BEGIN
> VACUUM
> COMMIT
>
> Actually, I've thought before that it was bit weird that CLUSTER can be
> run within a transaction, but VACUUM FULL cannot (even though it does a
> CLUSTER behind the scenes).  VACUUM FULL can process multiple relations,
> whereas CLUSTER can't, but it seems nice to allow vacuum full for the
> case of a single relation.
>
> I haven't checked the rest of the patch, but +1 for allowing VACUUM FULL
> within a user txn.

My intention was to prevent that. I am certainly quite uneasy about
changing anything related to CLUSTER/VF, since they are old, complex
and bug prone.

So for now, I will block VF, as was my original intent.

I will be guided by what others think... so you may yet get your wish.


> Maybe the error message needs to be qualified "...when multiple
> relations are specified".
>
> ERROR:  VACUUM cannot run inside a transaction block

Hmm, that is standard wording based on the statement type, but I can
set a CONTEXT message also. Will update accordingly.

Thanks for your input.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: psql: Add command to use extended query protocol

2022-11-01 Thread Simon Riggs
On Tue, 1 Nov 2022 at 20:48, Peter Eisentraut
 wrote:
>
> On 01.11.22 10:10, Simon Riggs wrote:
> > On Fri, 28 Oct 2022 at 07:53, Peter Eisentraut
> >  wrote:
> >>
> >> This adds a new psql command \gp that works like \g (or semicolon) but
> >> uses the extended query protocol.  Parameters can also be passed, like
> >>
> >>   SELECT $1, $2 \gp 'foo' 'bar'
> >
> > +1 for the concept. The patch looks simple and complete.
> >
> > I find it strange to use it the way you have shown above, i.e. \gp on
> > same line after a query.
>
> That's how all the "\g" commands work.

Yes, I see that, but it also works exactly the way I said also.

i.e.
SELECT 'foo'
\g

is the same thing as

SELECT 'foo' \g

But there are no examples in the docs of the latter usage, and so it
is a surprise to me and probably to others also

> > ...since if we used this in a script, it would be used like this, I think...
> >
> >SELECT $1, $2
> >\gp 'foo' 'bar'
> >\gp 'bar' 'baz'
> >...
>
> Interesting, but I think for that we should use named prepared
> statements, so that would be a separate "\gsomething" command in psql, like
>
>  SELECT $1, $2 \gprep p1
>  \grun p1 'foo' 'bar'
>  \grun p1 'bar' 'baz'

Not sure I understand this... you seem to be arguing against your own
patch?? I quite liked the way you had it, I'm just asking for the docs
to put the \gp on the following line.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: psql: Add command to use extended query protocol

2022-11-01 Thread Simon Riggs
On Fri, 28 Oct 2022 at 07:53, Peter Eisentraut
 wrote:
>
> This adds a new psql command \gp that works like \g (or semicolon) but
> uses the extended query protocol.  Parameters can also be passed, like
>
>  SELECT $1, $2 \gp 'foo' 'bar'

+1 for the concept. The patch looks simple and complete.

I find it strange to use it the way you have shown above, i.e. \gp on
same line after a query.

For me it would be clearer to have tests and docs showing this
  SELECT $1, $2
  \gp 'foo' 'bar'

> Perhaps this would also be useful for general psql scripting.

...since if we used this in a script, it would be used like this, I think...

  SELECT $1, $2
  \gp 'foo' 'bar'
  \gp 'bar' 'baz'
  ...

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Commit fest 2022-11

2022-11-01 Thread Simon Riggs
On Mon, 31 Oct 2022 at 05:42, Michael Paquier  wrote:

> As per the world clock, the next commit fest will begin in 30 hours
> (11/1 0:00 AoE time).  I may have missed something, but it looks like
> we have no CFM for this one yet.
>
> Opinions, thoughts or volunteers?

If we have no volunteers, maybe it is because the job of CFM is too
big now and needs to be split.

I can offer to be co-CFM, and think I can offer to be CFM for about
100 patches, but I don't have the time to do the whole thing myself.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Code checks for App Devs, using new options for transaction behavior

2022-10-31 Thread Simon Riggs
On Mon, 31 Oct 2022 at 12:22, Dilip Kumar  wrote:
>
> On Mon, Oct 31, 2022 at 5:03 PM Dilip Kumar  wrote:
> >
> > On Mon, Oct 31, 2022 at 4:23 PM Dilip Kumar  wrote:
> > >
> > > On Sun, Oct 30, 2022 at 11:32 PM Simon Riggs
> > >  wrote:
> > > >
> > > > On Fri, 28 Oct 2022 at 10:33, Simon Riggs 
> > > >  wrote:
> > > >
> > > > > Thanks for the feedback, I will make all of those corrections in the
> > > > > next version.
> > > >
> > > > New version attached. I've rolled 002-004 into one patch, but can
> > > > split again as needed.
> > >
> > > I like the idea of "parse only" and "nested xact", thanks for working
> > > on this.  I will look into patches in more detail, especially nested
> > > xact. IMHO there is no point in merging "nested xact" and "rollback on
> > > commit". They might be changing the same code location but these two
> > > are completely different ideas, in fact all these three should be
> > > reviewed as three separate threads as you mentioned in the first email
> > > in the thread.
> >
> > What is the behavior if "nested_transactions" value is changed within
> > a transaction execution, suppose the value was on and we have created
> > a few levels of nested subtransactions and within the same transaction
> > I switched it to off or to outer?
>
> 1.
> @@ -3815,6 +3861,10 @@ PrepareTransactionBlock(const char *gid)
>  /* Set up to commit the current transaction */
>  result = EndTransactionBlock(false);
>
> +/* Don't allow prepare until we are back to an unnested state at level 0 
> */
> +if (XactNestingLevel > 0)
> +return false;
>
>
> I am not sure whether it is good to not allow PREPARE or we can just
> prepare it and come out of the complete nested transaction.  Suppose
> we have multiple savepoints and we say prepare then it will just
> succeed so why does it have to be different here?

I'm happy to discuss what the behavior should be in this case. It is
not a common case,
and people don't put PREPARE in their scripts except maybe in a test.

My reasoning for this code is that we don't want to accept a COMMIT
until we reach top-level of nesting,
so the behavior should be similar for PREPARE, which is just
first-half of final commit.

Note that the nesting of begin/commit is completely separate to the
existence/non-existence of subtransactions, especially with
nested_transactions = 'outer'


> 2.case TBLOCK_SUBABORT:
>  ereport(WARNING,
>  (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
>   errmsg("there is already a transaction in progress")));
> +if (XactNesting == XACT_NEST_OUTER)
> +XactNestingLevel++;
>  break;
>
> I did not understand what this change is for, can you tell me the
> scenario or a test case to hit this?

Well spotted, thanks. That seems to be some kind of artefact.

There is no test that exercises that since it is an unintended change,
so I have removed it.


> Remaining part w.r.t "nested xact" patch looks fine, I haven't tested
> it yet though.

New versions attached, separated again as you suggested.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


001_psql_parse_only.v1.patch
Description: Binary data


002_nested_xacts.v9.patch
Description: Binary data


003_rollback_on_commit.v1.patch
Description: Binary data


Re: Code checks for App Devs, using new options for transaction behavior

2022-10-31 Thread Simon Riggs
On Mon, 31 Oct 2022 at 11:33, Dilip Kumar  wrote:

> What is the behavior if "nested_transactions" value is changed within
> a transaction execution, suppose the value was on and we have created
> a few levels of nested subtransactions and within the same transaction
> I switched it to off or to outer?

Patch does the same dance as with other xact variables.

XactNesting is the value within the transaction and in the patch this
is not exported, so cannot be set externally.

XactNesting is set at transaction start to the variable
DefaultXactNesting, which is set by the GUC.

So its not a problem, but thanks for checking.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Code checks for App Devs, using new options for transaction behavior

2022-10-30 Thread Simon Riggs
On Fri, 28 Oct 2022 at 10:33, Simon Riggs  wrote:

> Thanks for the feedback, I will make all of those corrections in the
> next version.

New version attached. I've rolled 002-004 into one patch, but can
split again as needed.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


001_psql_parse_only.v1.patch
Description: Binary data


002_nested_xacts_and_rollback_on_commit.v8.patch
Description: Binary data


Re: Code checks for App Devs, using new options for transaction behavior

2022-10-28 Thread Simon Riggs
On Fri, 28 Oct 2022 at 07:54, Erik Rijkers  wrote:
>
> Op 27-10-2022 om 18:35 schreef Simon Riggs:
> > On Thu, 27 Oct 2022 at 12:09, Simon Riggs  
> > wrote:
> >
> >> Comments please
> >
> > Update from patch tester results.
> >
>
>  > [001_psql_parse_only.v1.patch ]
>  > [002_nested_xacts.v7.patch]
>  > [003_rollback_on_commit.v1.patch  ]
>  > [004_add_params_to_sample.v1.patch]
>
>
> patch 002 has (2x) :
>'transction'  should be
>'transaction'
>
> also in patch 002:
>'at any level will be abort'  should be
>'at any level will abort'
>
> I also dislike the 'we' in
>
>'Once we reach the top-level transaction,'
>
> That seems a bit too much like the 'we developers working together to
> make a database server system' which is of course used often and
> usefully on this mailinglist and in code itself.  But I think
> user-facing docs should be careful with that team-building 'we'.  I
> remember well how it confused me, many years ago.  Better, IMHO:
>
>'Once the top-level transaction is reached,'

Thanks for the feedback, I will make all of those corrections in the
next version.

I'm guessing you like the features??

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Code checks for App Devs, using new options for transaction behavior

2022-10-27 Thread Simon Riggs
On Thu, 27 Oct 2022 at 12:09, Simon Riggs  wrote:

> Comments please

Update from patch tester results.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


002_nested_xacts.v7.patch
Description: Binary data


001_psql_parse_only.v1.patch
Description: Binary data


003_rollback_on_commit.v1.patch
Description: Binary data


004_add_params_to_sample.v1.patch
Description: Binary data


Re: Allow single table VACUUM in transaction block

2022-10-27 Thread Simon Riggs
On Thu, 27 Oct 2022 at 10:31, Simon Riggs  wrote:

> Tests, docs.

The patch tester says that a pg_upgrade test is failing on Windows,
but works for me.

t/002_pg_upgrade.pl .. ok

Anybody shed any light on that, much appreciated.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Code checks for App Devs, using new options for transaction behavior

2022-10-27 Thread Simon Riggs
In the past, developers have wondered how we can provide "--dry-run"
functionality
https://www.postgresql.org/message-id/15791.1450383201%40sss.pgh.pa.us

This is important for application developers, especially when
migrating programs to Postgres.

Presented here are 3 features aimed at developers, each of which is
being actively used by me in a large and complex migration project.

* psql --parse-only
Checks the syntax of all SQL in a script, but without actually
executing it. This is very important in the early stages of complex
migrations because we need to see if the code would generate syntax
errors before we attempt to execute it. When there are many
dependencies between objects, actual execution fails very quickly if
we run in a single transaction, yet running outside of a transaction
can leave a difficult cleanup task. Fixing errors iteratively is
difficult when there are long chains of dependencies between objects,
since there is no easy way to predict how long it will take to make
everything work unless you understand how many syntax errors exist in
the script.
001_psql_parse_only.v1.patch

* nested transactions = off (default) | all | on
Handle nested BEGIN/COMMIT, which can cause chaos on failure. This is
an important part of guaranteeing that everything that gets executed
is part of a single atomic transaction, which can then be rolled back
- this is a pre-requisite for the last feature.
002_nested_xacts.v7.patch
The default behavior is unchanged (off)
Setting "all" treats nested BEGIN/COMMIT as subtransactions, allowing
some parts to fail without rolling back the outer transaction.
Setting "outer" flattens nested BEGIN/COMMIT into one single outer
transaction, so that any failure rolls back the entire transaction.

* rollback_on_commit = off (default) | on
Force transactions to fail their final commit, ensuring that no
lasting change is made when a script is tested. i.e. accept COMMIT,
but do rollback instead.
003_rollback_on_commit.v1.patch

We will probably want to review these on separate threads, but the
common purpose of these features is hopefully clear from these notes.

001 and 003 are fairly small patches, 002 is longer.

Comments please

-- 
Simon Riggshttp://www.EnterpriseDB.com/


001_psql_parse_only.v1.patch
Description: Binary data


002_nested_xacts.v7.patch
Description: Binary data


003_rollback_on_commit.v1.patch
Description: Binary data


Allow single table VACUUM in transaction block

2022-10-27 Thread Simon Riggs
It is a common user annoyance to have a script fail because someone
added a VACUUM, especially when using --single-transaction option.
Fix, so that this works without issue:

BEGIN;

VACUUM (ANALYZE) vactst;

COMMIT;

Allows both ANALYZE and vacuum of toast tables, but not VACUUM FULL.

When in a xact block, we do not set PROC_IN_VACUUM,
nor update datfrozenxid.

Tests, docs.

--
Simon Riggshttp://www.EnterpriseDB.com/


single_table_vacuum.v1.patch
Description: Binary data


Re: New docs chapter on Transaction Management and related changes

2022-10-24 Thread Simon Riggs
On Sun, 16 Oct 2022 at 02:08, Bruce Momjian  wrote:
>
> On Fri, Oct 14, 2022 at 05:46:55PM -0400, Robert Treat wrote:
> > On Fri, Oct 14, 2022 at 3:51 PM Bruce Momjian  wrote:
> > > Attached is the merged patch from all the great comments I received.  I
> > > have also rebuilt the docs with the updated patch:
> > >
> > > https://momjian.us/tmp/pgsql/
> > >
> >
> > +   RELEASE SAVEPOINT also subcommits and destroys
> > +   all savepoints that were established after the named savepoint was
> > +   established. This means that any subtransactions of the named savepoint
> > +   will also be subcommitted and destroyed.
> >
> > Wonder if we should be more explicit that data changes are preserved,
> > not destroyed... something like:
> > "This means that any changes within subtransactions of the named
> > savepoint will be subcommitted and those subtransactions will be
> > destroyed."
>
> Good point.  I reread the section and there was just too much confusion
> over subtransactions, partly because the behavior just doesn't map
> easily to subtransaction.  I therefore merged all three paragraphs into
> one and tried to make the text saner;  release_savepoint.sgml diff
> attached, URL content updated.

Just got around to reading this, thanks for changes.

The rewording doesn't work for me. The use of the word "destroy" is
very misleading, since the effect is to commit.

So I think we must avoid use of the word destroy completely. Possible
rewording:

RELEASE SAVEPOINT will subcommit the subtransaction
established by the named savepoint, releasing any resources held by
it. If there were any subtransactions created by the named savepoint,
these will also be subcommitted.

The point is that savepoints create subtransactions, but they are not
the only way to create them, so we cannot equate savepoint and
subtransaction completely.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Simon Riggs
On Fri, 14 Oct 2022 at 08:55, Simon Riggs  wrote:

> In related changes, I've also done some major rewording of the RELEASE
> SAVEPOINT command, since it was incorrectly described as having "no
> other user visible behavior". A complex example is given to explain,
> using the terminology established in the main patch.

ROLLBACK TO SAVEPOINT also needs some clarification, patch attached.

(Commentary: It's confusing to me that ROLLBACK TO SAVEPOINT starts a
new subtransaction, whereas RELEASE SAVEPOINT does not. You might
imagine they would both start a new subtransaction, but that is not
the case.)

-- 
Simon Riggshttp://www.EnterpriseDB.com/


doc-rollback-to.v1.patch
Description: Binary data


Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Simon Riggs
On Fri, 14 Oct 2022 at 09:46, Alvaro Herrera  wrote:
>
> +1 for this new chapter.  This latest patch looks pretty good.  I think
> that introducing the concept of "sub-commit" as in Simon's follow-up
> patch clarifies things, though the word itself looks very odd.  Maybe
> it's okay.  The addition of the savepoint example looks good also.
>
> On 2022-Oct-13, Bruce Momjian wrote:
>
> > +  
> > +   PostgreSQL supports a two-phase commit (2PC)
> > +   protocol that allows multiple distributed systems to work together
> > +   in a transactional manner.  The commands are PREPARE
> > +   TRANSACTION, COMMIT PREPARED and
>
> I suggest/request that we try to avoid breaking tagged constants in
> DocBook; doing so makes it much easier to miss them later when grepping
> for them (don't laugh, it has happened to me).  Also, it breaks
> formatting in some weird cases.  I know this makes editing a bit harder
> because you can't just reflow with your editor like you would normal
> text.  So this'd be:
>
> +   in a transactional manner.  The commands are PREPARE 
> TRANSACTION,
> +   COMMIT PREPARED and
>
> with whatever word wrapping you like, except breaking between PREPARE
> and TRANSACTION.
>
> > +   
> > +In addition to vxid and xid,
> > +when a transaction is prepared it is also identified by a Global
> > +Transaction Identifier (GID). GIDs
> > +are string literals up to 200 bytes long, which must be
> > +unique amongst other currently prepared transactions.
> > +The mapping of GID to xid is shown in  > +
> > linkend="view-pg-prepared-xacts">pg_prepared_xacts.
> > +   
>
> Maybe say "is prepared for two-phase commit", to make the topic of this
> paragraph more obvious?
>
> > +   
> > +Lock waits on table-level locks are shown waiting for
> > +virtualxid, while lock waits on row-level
> > +locks are shown waiting for transactionid.
> > +Row-level read and write locks are recorded directly in locked
> > +rows and can be inspected using the 
> > +extension.  Row-level read locks might also require the assignment
> > +of multixact IDs (mxid). Mxids are recorded in
> > +the pg_multixact directory.
> > +   
>
> Hmm, ok.
>
> > +   
> > +The parent xid of each subxid is recorded in the
> > +pg_subtrans directory. No entry is made for
> > +top-level xids since they do not have a parent, nor is an entry made
> > +for read-only subtransactions.
> > +   
>
> Maybe say "the immediate parent xid of each ...", or is it too obvious?

+1 to all of those suggestions

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: New docs chapter on Transaction Management and related changes

2022-10-14 Thread Simon Riggs
On Thu, 13 Oct 2022 at 23:06, Bruce Momjian  wrote:
>
> Thanks, updated patch attached.  You can see the output at:

Thank you for your work to tighten and cleanup this patch, much appreciated.

I had two minor typos, plus a slight rewording to avoid using the word
"global" in the last section, since that is associated with
distributed or 2PC transactions. For those changes, I provide a
patch-on-patch so you can see clearly.

In related changes, I've also done some major rewording of the RELEASE
SAVEPOINT command, since it was incorrectly described as having "no
other user visible behavior". A complex example is given to explain,
using the terminology established in the main patch.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


xact.patch-on-patch
Description: Binary data


doc-release-savepoint.v1.patch
Description: Binary data


Re: Startup process on a hot standby crashes with an error "invalid memory alloc request size 1073741824" while replaying "Standby/LOCK" records

2022-10-10 Thread Simon Riggs
On Wed, 5 Oct 2022 at 16:30, Tom Lane  wrote:
>
> Kyotaro Horiguchi  writes:
> > At Tue, 4 Oct 2022 17:15:31 -0700, Nathan Bossart 
> >  wrote in
> >> On Tue, Oct 04, 2022 at 07:53:11PM -0400, Tom Lane wrote:
> >>> After further thought, maybe it'd be better to do it as attached,
> >>> with one long-lived hash table for all the locks.
>
> > First one is straight forward outcome from the current implement but I
> > like the new one.  I agree that it is natural and that the expected
> > overhead per (typical) transaction is lower than both the first one
> > and doing the same operation on a list. I don't think that space
> > inefficiency in that extent doesn't matter since it is the startup
> > process.
>
> To get some hard numbers about this, I made a quick hack to collect
> getrusage() numbers for the startup process (patch attached for
> documentation purposes).  I then ran the recovery/t/027_stream_regress.pl
> test a few times and collected the stats (also attached).  This seems
> like a reasonably decent baseline test, since the core regression tests
> certainly take lots of AccessExclusiveLocks what with all the DDL
> involved, though they shouldn't ever take large numbers at once.  Also
> they don't run long enough for any lock list bloat to occur, so these
> numbers don't reflect a case where the patches would provide benefit.
>
> If you look hard, there's maybe about a 1% user-CPU penalty for patch 2,
> although that's well below the run-to-run variation so it's hard to be
> sure that it's real.  The same comments apply to the max resident size
> stats.  So I'm comforted that there's not a significant penalty here.
>
> I'll go ahead with patch 2 if there's not objection.

Happy to see this change.

> One other point to discuss: should we consider back-patching?  I've
> got mixed feelings about that myself.  I don't think that cases where
> this helps significantly are at all mainstream, so I'm kind of leaning
> to "patch HEAD only".

It looks fine to eventually backpatch, since StandbyReleaseLockTree()
was optimized to only be called when the transaction had actually done
some AccessExclusiveLocks.

So the performance loss is minor and isolated to the users of such
locks, so I see no problems with it.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-09-26 Thread Simon Riggs
On Fri, 16 Sept 2022 at 13:20, Simon Riggs  wrote:
>
> Thanks for the review.
>
> v10 attached

v11 attached, corrected for recent commit
14ff44f80c09718d43d853363941457f5468cc03.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


002_minimize_calls_to_SubTransSetParent.v11.patch
Description: Binary data


Re: Pruning never visible changes

2022-09-22 Thread Simon Riggs
On Thu, 22 Sept 2022 at 15:16, Alvaro Herrera  wrote:
>
> On 2022-Sep-22, Simon Riggs wrote:
>
> > On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:
>
> > > VACUUM was willing to remove a committed-dead tuple immediately if it 
> > > was
> > > deleted by the same transaction that inserted it.  The idea is that 
> > > such a
> > > tuple could never have been visible to any other transaction, so we 
> > > don't
> > > need to keep it around to satisfy MVCC snapshots.  However, there was
> > > already an exception for tuples that are part of an update chain, and 
> > > this
> > > exception created a problem: we might remove TOAST tuples (which are 
> > > never
> > > part of an update chain) while their parent tuple stayed around (if 
> > > it was
> > > part of an update chain).  This didn't pose a problem for most things,
> > > since the parent tuple is indeed dead: no snapshot will ever consider 
> > > it
> > > visible.  But MVCC-safe CLUSTER had a problem, since it will try to 
> > > copy
> > > RECENTLY_DEAD tuples to the new table.  It then has to copy their 
> > > TOAST
> > > data too, and would fail if VACUUM had already removed the toast 
> > > tuples.
>
> > Good research Greg, thank you. Only took 10 years for me to notice it
> > was gone ;-)
>
> But this begs the question: is the proposed change safe, given that
> ancient consideration?  I don't think TOAST issues have been mentioned
> in this thread so far, so I wonder if there is a test case that verifies
> that this problem doesn't occur for some reason.

Oh, completely agreed.

I will submit a modified patch that adds a test case and just a
comment to explain why we can't remove such rows.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-22 Thread Simon Riggs
On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:
>
> On Fri, 16 Sept 2022 at 10:27, Tom Lane  wrote:
> >
> > Simon Riggs  writes:
> > > A user asked me whether we prune never visible changes, such as
> > > BEGIN;
> > > INSERT...
> > > UPDATE.. (same row)
> > > COMMIT;
> >
> > Didn't we just have this discussion in another thread?
>
> Well.  not "just" :)
>
> commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
> Author: Tom Lane 
> Date:   Fri Apr 29 16:29:42 2011 -0400
>
> Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().
>
> VACUUM was willing to remove a committed-dead tuple immediately if it was
> deleted by the same transaction that inserted it.  The idea is that such a
> tuple could never have been visible to any other transaction, so we don't
> need to keep it around to satisfy MVCC snapshots.  However, there was
> already an exception for tuples that are part of an update chain, and this
> exception created a problem: we might remove TOAST tuples (which are never
> part of an update chain) while their parent tuple stayed around (if it was
> part of an update chain).  This didn't pose a problem for most things,
> since the parent tuple is indeed dead: no snapshot will ever consider it
> visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
> RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
> data too, and would fail if VACUUM had already removed the toast tuples.
>
> Easiest fix is to get rid of the special case for xmin == xmax.  This may
> delay reclaiming dead space for a little bit in some cases, but it's by 
> far
> the most reliable way to fix the issue.
>
> Per bug #5998 from Mark Reid.  Back-patch to 8.3, which is the oldest
> version with MVCC-safe CLUSTER.

Good research Greg, thank you. Only took 10 years for me to notice it
was gone ;-)

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Hash index build performance tweak from sorting

2022-09-21 Thread Simon Riggs
On Wed, 21 Sept 2022 at 02:32, David Rowley  wrote:
>
> On Tue, 2 Aug 2022 at 03:37, Simon Riggs  wrote:
> > Using the above test case, I'm getting a further 4-7% improvement on
> > already committed code with the attached patch, which follows your
> > proposal.
> >
> > The patch passes info via a state object, useful to avoid API churn in
> > later patches.
>
> Hi Simon,
>
> I took this patch for a spin and saw a 2.5% performance increase using
> the random INT test that Tom posted. The index took an average of
> 7227.47 milliseconds on master and 7045.05 with the patch applied.

Hi David,

Thanks for tests and review. I'm just jumping on a plane, so may not
respond in detail until next Mon.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-17 Thread Simon Riggs
On Fri, 16 Sept 2022 at 18:37, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Fri, 16 Sept 2022 at 15:26, Tom Lane  wrote:
> >> You cannot
> >> do that, at least not without checking that the originating
> >> transaction has no snapshots that could see the older row version.
>
> > I'm saying this is possible only AFTER the end of the originating
> > xact, so there are no issues with additional snapshots.
>
> I see, so the point is just that we can prune even if the originating
> xact hasn't yet passed the global xmin horizon.  I agree that's safe,
> but will it fire often enough to be worth the trouble?

It is an edge case with limited utility, I agree, which is why I
looked for a cheap test (xmin == xmax only).

This additional test is also valid, but too expensive to apply:
TransactionIdGetTopmostTranactionId(xmax) ==
TransactionIdGetTopmostTranactionId(xmin)

> Also, why
> does it need to be restricted to certain shapes of HOT chains ---
> that is, why can't we do exactly what we'd do if the xact *were*
> past the xmin horizon?

I thought it important to maintain the integrity of the HOT chain, in
that the xmax of one tuple version matches the xmin of the next. So
pruning only from the root of the chain allows us to maintain that
validity check.

I'm on the fence myself, which is why I didn't post it immediately I
had written it.

If not, it would be useful to add a note in comments to the code to
explain why we don't do this.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-17 Thread Simon Riggs
On Fri, 16 Sept 2022 at 21:07, Laurenz Albe  wrote:
>
> On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote:
> > Simon Riggs  writes:
> > > A user asked me whether we prune never visible changes, such as
> > > BEGIN;
> > > INSERT...
> > > UPDATE.. (same row)
> > > COMMIT;
> >
> > Didn't we just have this discussion in another thread?
>
> For reference: that was
> https://postgr.es/m/f6a491b32cb44bb5daaafec835364f7149348fa1.ca...@cybertec.at

Thanks. I confirm I hadn't seen that, and indeed, I wrote the patch on
5 Sept before you posted.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Slow standby snapshot

2022-09-17 Thread Simon Riggs
On Fri, 16 Sept 2022 at 17:08, Michail Nikolaev
 wrote:
>
> Hello everyone.
>
> To find the best frequency for calling KnownAssignedXidsCompress in
> Simon's patch, I made a set of benchmarks. It looks like each 8th xid
> is a little bit often.
>
> Setup and method is the same as previous (1). 16-core machines,
> max_connections = 5000. Tests were running for about a day, 220 runs
> in total (each version for 20 times, evenly distributed throughout the
> day).
>
> Staring from 60th second, 30 seconds-long transaction was started on primary.
>
> Graphs in attachment. So, looks like 64 is the best value here. It
> gives even a little bit more TPS than smaller values.
>
> Yes, such benchmark does not cover all possible cases, but it is
> better to measure at least something when selecting constants :)

This is very good and clear, thank you.


> If someone has an idea of different benchmark scenarios - please share them.

> So, updated version (with 64 and some commit message) in attachment too.
>
> [1]: 
> https://www.postgresql.org/message-id/flat/CANtu0ohzBFTYwdLtcanWo4%2B794WWUi7LY2rnbHyorJdE8_ZnGg%40mail.gmail.com#379c1be7b8134ada5a574078d51b64c6

I've cleaned up the comments and used a #define for the constant.

IMHO this is ready for commit. I will add it to the next CF.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


v8c-new-heuristic-to-compress-KnownAssignedXids.patch
Description: Binary data


Re: Pruning never visible changes

2022-09-16 Thread Simon Riggs
On Fri, 16 Sept 2022 at 15:26, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > A user asked me whether we prune never visible changes, such as
> > BEGIN;
> > INSERT...
> > UPDATE.. (same row)
> > COMMIT;
>
> Didn't we just have this discussion in another thread?

Not that I was aware of, but it sounds like a different case anyway.

> You cannot
> do that, at least not without checking that the originating
> transaction has no snapshots that could see the older row version.

I'm saying this is possible only AFTER the end of the originating
xact, so there are no issues with additional snapshots.

i.e. the never visible row has to be judged RECENTLY_DEAD before we even check.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Pruning never visible changes

2022-09-16 Thread Simon Riggs
A user asked me whether we prune never visible changes, such as

BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;

Once committed, the original insert is no longer visible to anyone, so
"ought to be able to be pruned", sayeth the user. And they also say
that changing the app is much harder, as ever.

After some thought, Yes, we can prune, but not in all cases - only if
the never visible tuple is at the root end of the update chain. The
only question is can that be done cheaply enough to bother with. The
answer in one specific case is Yes, in other cases No.

This patch adds a new test for this use case, and code to remove the
never visible row when the changes are made by the same xid.

(I'm pretty sure there used to be a test for this some years back and
I'm guessing it was removed because it isn't always possible to remove
the tuple, which this new patch honours.)

Please let me know what you think.

--
Simon Riggshttp://www.EnterpriseDB.com/


never_visible.v1.patch
Description: Binary data


Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-09-16 Thread Simon Riggs
On Thu, 15 Sept 2022 at 12:36, Japin Li  wrote:
>
>
> On Thu, 15 Sep 2022 at 18:04, Simon Riggs  
> wrote:
> > On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera  
> > wrote:
> >>
> >> On 2022-Aug-30, Simon Riggs wrote:
> >>
> >> > 001_new_isolation_tests_for_subxids.v3.patch
> >> > Adds new test cases to master without adding any new code, specifically
> >> > addressing the two areas of code that are not tested by existing tests.
> >> > This gives us a baseline from which we can do test driven development.
> >> > I'm hoping this can be reviewed and committed fairly smoothly.
> >>
> >> I gave this a quick run to confirm the claimed increase of coverage.  It
> >> checks out, so pushed.
> >
> > Thank you.
> >
> > So now we just have the main part of the patch, reattached here for
> > the auto patch tester's benefit.
>
> Hi Simon,
>
> Thanks for the updated patch, here are some comments.

Thanks for your comments.

> There is a typo, 
> s/SubTransGetTopMostTransaction/SubTransGetTopmostTransaction/g.
>
> +*call SubTransGetTopMostTransaction() if that xact 
> overflowed;
>
>
> Is there a punctuation mark missing on the following first line?
>
> +* 2. When IsolationIsSerializable() we sometimes need to 
> access topxid
> +*This occurs only when SERIALIZABLE is requested by app 
> user.
>
>
> When we use function name in comments, some places we use parentheses,
> but others do not use it. Why? I think, we should keep them consistent,
> at least in the same commit.
>
> +* 3. When TransactionIdSetTreeStatus will use a status of 
> SUB_COMMITTED,
> +*which then requires us to consult subtrans to find 
> parent, which
> +*is needed to avoid race condition. In this case we ask 
> Clog/Xact
> +*module if TransactionIdsAreOnSameXactPage(). Since we 
> start a new
> +*clog page every 32000 xids, this is usually <<1% of 
> subxids.

I've reworded those comments, hoping to address all of your above points.

> Maybe we declaration a topxid to avoid calling GetTopTransactionId()
> twice when we should set subtrans parent?
>
> +   TransactionId subxid = 
> XidFromFullTransactionId(s->fullTransactionId);
> +   TransactionId topxid = GetTopTransactionId();
>   ...
> +   if (MyProc->subxidStatus.overflowed ||
> +   IsolationIsSerializable() ||
> +   !TransactionIdsAreOnSameXactPage(topxid, subxid))
> +   {
>   ...
> +   SubTransSetParent(subxid, topxid);
> +   }

Seems a minor point, but I've done this anyway.

Thanks for the review.

v10 attached

--
Simon Riggshttp://www.EnterpriseDB.com/


002_minimize_calls_to_SubTransSetParent.v10.patch
Description: Binary data


  1   2   3   4   5   6   7   >