Re: PG13 fails to startup even though the current transaction is equal to the target transaction

2021-03-22 Thread Sean Jezewski
Hi Kyotaro -

Thanks for the response.

I think it boils down to your comment:

> I'm not sure.  The direct cause of the "issue" is a promotion trigger
> that came before reaching recovery target.  That won't happen if the
> "someone" doesn't do that.

I think the question is 'under what conditions is it safe to do the
promotion' ?

What is your recommendation in this case? The end of the archive has been
reached. All transactions have been replayed. And in fact the current
transaction id is exactly equal to the target recovery transaction id.

So by all the indicators I can see, this recovery is in fact done. All the
data that should be there is there. All the transactions that I want
replayed have been replayed. (In fact all the transactions in the archive
have been replayed).

If we stop and wait before hitting the promotion trigger, we could wait
indefinitely (if the parent service has no more incoming transactions,
which means no more WAL to replay).

Are you recommending that we wait until another transaction happens on the
parent DB?

Thanks,
Sean

On Mon, Mar 22, 2021 at 3:59 AM Kyotaro Horiguchi 
wrote:

> At Thu, 18 Mar 2021 17:59:34 -0400, Sean Jezewski <
> sjezew...@salesforce.com> wrote in
> > We've noticed what may be a regression / bug in PG13.
> >
> > I work at Heroku on the Data team, where we manage a fleet of Postgres
> > services. This change has resulted in breaking the UX we offer to
> customers
> > to manage their PG services. In particular, ‘forks’ and ‘point in time
> > restores’ seem broken for PG13.
> >
> > I believe it is related to this patch (
> >
> https://www.postgresql.org/message-id/993736dd3f1713ec1f63fc3b653839f5%40lako.no
> > )
> >
> > For PG12, we expect:
> >
> > -- We create a new Postgres service from archive and provide a
> > recovery_target_xid
> > -- PG replays the archive until the end of the archive is reached, and
> the
> > current transaction == recovery_target_xid
> > -- We measure the current transaction via the query SELECT
> > pg_catalog.txid_snapshot_xmax(pg_catalog.txid_current_snapshot())
> > -- Since the current transaction is exactly equal to the target
> > transaction, we perform the promotion
> >
> > For PG12, what we get:
> >
> > -- This process completes smoothly, and the new postgres service is up
> and
> > running
> ...
> > For PG13, what we get:
> >
> > -- On promotion we see the postgres process exit with the following log
> > lines:
> >
> > Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[7]:
> > [18-1] sql_error_code = 0 LOG: promote trigger file found:
> > /etc/postgresql/wal-e.d/pull-env/STANDBY_OFF
>
> This means someone other than the server itself has placed that file
> to cause the promotion, perhaps before reaching the target point of
> the recovery.  Even if that happened on PG12, server is uninterested
> in the cause of the recovery stop and happily proceeds to
> promotion. Thus, it is likely that the configured target xid actually
> have not been reached at promotion at least in the PG13 case.
>
> > Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[7]:
> > [19-1] sql_error_code = 0 LOG: redo done at 0/60527E0
> ...
> > [21-1] sql_error_code = XX000 FATAL: recovery ended before configured
> > recovery target was reached
>
> In PG13, startup process complains like this even if recovery is
> stopped by operational (or manual) promotion.  There might be other
> behaviors but it seems to be reasonable to give priority on
> configuration in postgresql.conf over on-the-fly operations like
> promotion triggering.
>
> > Even though the transaction IDs are identical. It seems like the end of
> the
> > archive was reached (in fact the last transaction), and while we arrived
> at
> > the correct transaction id, somehow PG decided it wasn’t done replaying?
> > Perhaps because somehow the timestamps don’t line up? (Afaict we do not
> set
> > the recovery_target_time, just the recovery_target_xid)
> >
> > We have the `recovery_target_inclusive` set to true, which is the
> default.
> > It really seems like the intent of that setting means that if the target
> > equals the current transaction ID, recovery should be marked as complete.
> > However we're seeing the opposite. While the current txn id == the target
> > transaction id, the server exits. This is surprising, and doesn't line up
> > with our expected behavior.
>
> So at least the issue raised here doesn't seem relevant to how
> xid-targetted PITR works.
>
> > We have a workaround.
> >
> > Right before pr

PG13 fails to startup even though the current transaction is equal to the target transaction

2021-03-18 Thread Sean Jezewski
We've noticed what may be a regression / bug in PG13.

I work at Heroku on the Data team, where we manage a fleet of Postgres
services. This change has resulted in breaking the UX we offer to customers
to manage their PG services. In particular, ‘forks’ and ‘point in time
restores’ seem broken for PG13.

I believe it is related to this patch (
https://www.postgresql.org/message-id/993736dd3f1713ec1f63fc3b653839f5%40lako.no
)

For PG12, we expect:

-- We create a new Postgres service from archive and provide a
recovery_target_xid
-- PG replays the archive until the end of the archive is reached, and the
current transaction == recovery_target_xid
-- We measure the current transaction via the query SELECT
pg_catalog.txid_snapshot_xmax(pg_catalog.txid_current_snapshot())
-- Since the current transaction is exactly equal to the target
transaction, we perform the promotion

For PG12, what we get:

-- This process completes smoothly, and the new postgres service is up and
running

For PG13, we expect:

-- We create a new Postgres service from archive and provide a
recovery_target_xid
-- PG replays the archive until the end of the archive is reached, and the
current transaction == recovery_target_xid
-- We measure the current transaction via the query SELECT
pg_catalog.txid_snapshot_xmax(pg_catalog.txid_current_snapshot())
-- Since the current transaction is exactly equal to the target
transaction, we perform the promotion

For PG13, what we get:

-- On promotion we see the postgres process exit with the following log
lines:

Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[7]:
[18-1] sql_error_code = 0 LOG: promote trigger file found:
/etc/postgresql/wal-e.d/pull-env/STANDBY_OFF
Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[7]:
[19-1] sql_error_code = 0 LOG: redo done at 0/60527E0
Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[7]:
[20-1] sql_error_code = 0 LOG: last completed transaction was at log
time 2021-03-17 14:42:44.901448+00
Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[7]:
[21-1] sql_error_code = XX000 FATAL: recovery ended before configured
recovery target was reached
Mar 17 14:47:49 ip-10-0-146-54 25a9551c_65ec_4870_99e9_df69151984a0[5]:
[8-1] sql_error_code = 0 LOG: startup process (PID 7) exited with exit
code 1

Even though the transaction IDs are identical. It seems like the end of the
archive was reached (in fact the last transaction), and while we arrived at
the correct transaction id, somehow PG decided it wasn’t done replaying?
Perhaps because somehow the timestamps don’t line up? (Afaict we do not set
the recovery_target_time, just the recovery_target_xid)

We have the `recovery_target_inclusive` set to true, which is the default.
It really seems like the intent of that setting means that if the target
equals the current transaction ID, recovery should be marked as complete.
However we're seeing the opposite. While the current txn id == the target
transaction id, the server exits. This is surprising, and doesn't line up
with our expected behavior.

We have a workaround.

Right before promotion, if we increment the transaction of the leader
database (the original PG service that we're forking from) by running
`SELECT pg_catalog.txid_current()`, wait 120 seconds (double our archive
timeout value to allow for the WAL segment to be written / uploaded /
read), and wait until the current transaction is strictly greater than the
target transaction, then the promotion seems to work fine every time for
PG13. But this seems like an off by one error?

What do you think? Is this a bug? Is this expected? Is this user error on
our end?

Thanks!

Sean