Re: [Patch] ALTER SYSTEM READ ONLY

Amit Kapila Wed, 17 Jun 2020 06:03:21 -0700

On Tue, Jun 16, 2020 at 7:26 PM amul sul <[email protected]> wrote:
>
> Hi,
>
> Attached patch proposes $Subject feature which forces the system into 
> read-only
> mode where insert write-ahead log will be prohibited until ALTER SYSTEM READ
> WRITE executed.
>
> The high-level goal is to make the availability/scale-out situation better.  
> The feature
> will help HA setup where the master server needs to stop accepting WAL writes
> immediately and kick out any transaction expecting WAL writes at the end, in 
> case
> of network down on master or replication connections failures.
>
> For example, this feature allows for a controlled switchover without needing 
> to shut
> down the master. You can instead make the master read-only, wait until the 
> standby
> catches up, and then promote the standby. The master remains available for 
> read
> queries throughout, and also for WAL streaming, but without the possibility 
> of any
> new write transactions. After switchover is complete, the master can be shut 
> down
> and brought back up as a standby without needing to use pg_rewind. 
> (Eventually, it
> would be nice to be able to make the read-only master into a standby without 
> having
> to restart it, but that is a problem for another patch.)
>
> This might also help in failover scenarios. For example, if you detect that 
> the master
> has lost network connectivity to the standby, you might make it read-only 
> after 30 s,
> and promote the standby after 60 s, so that you never have two writable 
> masters at
> the same time. In this case, there's still some split-brain, but it's still 
> better than what
> we have now.
>
> Design:
> ----------
> The proposed feature is built atop of super barrier mechanism commit[1] to 
> coordinate
> global state changes to all active backends.  Backends which executed
> ALTER SYSTEM READ { ONLY | WRITE } command places request to checkpointer
> process to change the requested WAL read/write state aka WAL prohibited and 
> WAL
> permitted state respectively.  When the checkpointer process sees the WAL 
> prohibit
> state change request, it emits a global barrier and waits until all backends 
> that
> participate in the ProcSignal absorbs it. Once it has done the WAL read/write 
> state in
> share memory and control file will be updated so that XLogInsertAllowed() 
> returns
> accordingly.
>


Do we prohibit the checkpointer to write dirty pages and write a
checkpoint record as well?  If so, will the checkpointer process
writes the current dirty pages and writes a checkpoint record or we
skip that as well?

> If there are open transactions that have acquired an XID, the sessions are 
> killed
> before the barrier is absorbed.
>

What about prepared transactions?

> They can't commit without writing WAL, and they
> can't abort without writing WAL, either, so we must at least abort the 
> transaction. We
> don't necessarily need to kill the session, but it's hard to avoid in all 
> cases because
> (1) if there are subtransactions active, we need to force the top-level abort 
> record to
> be written immediately, but we can't really do that while keeping the 
> subtransactions
> on the transaction stack, and (2) if the session is idle, we also need the 
> top-level abort
> record to be written immediately, but can't send an error to the client until 
> the next
> command is issued without losing wire protocol synchronization. For now, we 
> just use
> FATAL to kill the session; maybe this can be improved in the future.
>
> Open transactions that don't have an XID are not killed, but will get an 
> ERROR if they
> try to acquire an XID later, or if they try to write WAL without acquiring an 
> XID (e.g. VACUUM).
>

What if vacuum is on an unlogged relation?  Do we allow writes via
vacuum to unlogged relation?

> To make that happen, the patch adds a new coding rule: a critical section 
> that will write
> WAL must be preceded by a call to CheckWALPermitted(), AssertWALPermitted(), 
> or
> AssertWALPermitted_HaveXID(). The latter variants are used when we know for 
> certain
> that inserting WAL here must be OK, either because we have an XID (we would 
> have
> been killed by a change to read-only if one had occurred) or for some other 
> reason.
>
> The ALTER SYSTEM READ WRITE command can be used to reverse the effects of
> ALTER SYSTEM READ ONLY. Both ALTER SYSTEM READ ONLY and ALTER
> SYSTEM READ WRITE update not only the shared memory state but also the control
> file, so that changes survive a restart.
>
> The transition between read-write and read-only is a pretty major transition, 
> so we emit
> log message for each successful execution of a ALTER SYSTEM READ {ONLY | 
> WRITE}
> command. Also, we have added a new GUC system_is_read_only which returns "on"
> when the system is in WAL prohibited state or recovery.
>
> Another part of the patch that quite uneasy and need a discussion is that 
> when the
> shutdown in the read-only state we do skip shutdown checkpoint and at a 
> restart, first
> startup recovery will be performed and latter the read-only state will be 
> restored to
> prohibit further WAL write irrespective of recovery checkpoint succeed or 
> not. The
> concern is here if this startup recovery checkpoint wasn't ok, then it will 
> never happen
> even if it's later put back into read-write mode.
>

I am not able to understand this problem.  What do you mean by
"recovery checkpoint succeed or not", do you add a try..catch and skip
any error while performing recovery checkpoint?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [Patch] ALTER SYSTEM READ ONLY

Reply via email to