Re: [HACKERS] [PATCH] Logical decoding timeline following take II

Andres Freund Sat, 12 Nov 2016 07:08:58 -0800

On 2016-10-24 17:49:13 +0200, Petr Jelinek wrote:
> + * Determine which timeline to read an xlog page from and set the
> + * XLogReaderState's state->currTLI to that timeline ID.


"XLogReaderState's state->currTLI" - the state's a bit redundant.

> + * The caller must also make sure it doesn't read past the current redo 
> pointer
> + * so it doesn't fail to notice that the current timeline became historical.
> + */

Not sure what that means? The redo pointer usually menas the "logical
start" of the last checkpoint, but I don't see how thta could be meant
here?

> +static void
> +XLogReadDetermineTimeline(XLogReaderState *state, XLogRecPtr wantPage, 
> uint32 wantLength)
> +{
> +     const XLogRecPtr lastReadPage = state->readSegNo * XLogSegSize + 
> state->readOff;
> +
> +     elog(DEBUG4, "Determining timeline for read at %X/%X+%X",
> +                     (uint32)(wantPage>>32), (uint32)wantPage, wantLength);

Doing this on every single read from a page seems quite verbose.   It's
also (like most or all the following debug elogs) violating the casing
prescribed by the message style guidelines.


> +     /*
> +      * If the desired page is currently read in and valid, we have nothing 
> to do.
> +      *
> +      * The caller should've ensured that it didn't previously advance 
> readOff
> +      * past the valid limit of this timeline, so it doesn't matter if the 
> current
> +      * TLI has since become historical.
> +      */
> +     if (lastReadPage == wantPage &&
> +             state->readLen != 0 &&
> +             lastReadPage + state->readLen >= wantPage + 
> Min(wantLength,XLOG_BLCKSZ-1))
> +     {
> +             elog(DEBUG4, "Wanted data already valid"); //XXX
> +             return;
> +     }

With that kind of comment/XXX present, this surely can't be ready for
committer?


> +     /*
> +      * If we're reading from the current timeline, it hasn't become 
> historical
> +      * and the page we're reading is after the last page read, we can again
> +      * just carry on. (Seeking backwards requires a check to make sure the 
> older
> +      * page isn't on a prior timeline).
> +      */

How can ThisTimeLineID become historical?

> +     if (state->currTLI == ThisTimeLineID && wantPage >= lastReadPage)
> +     {
> +             Assert(state->currTLIValidUntil == InvalidXLogRecPtr);
> +             elog(DEBUG4, "On current timeline");
> +             return;
> +     }

Also, is it actually ok to rely on ThisTimeLineID here? That's IIRC not
maintained outside of the startup process, until recovery ended (cf it
being set in InitXLOGAccess() called via RecoveryInProgress()).


>               /*
> -              * TODO: we're going to have to do something more intelligent 
> about
> -              * timelines on standbys. Use readTimeLineHistory() and
> -              * tliOfPointInHistory() to get the proper LSN? For now we'll 
> catch
> -              * that case earlier, but the code and TODO is left in here for 
> when
> -              * that changes.
> +              * Check which timeline to get the record from.
> +              *
> +              * We have to do it each time through the loop because if we're 
> in
> +              * recovery as a cascading standby, the current timeline 
> might've
> +              * become historical.
>                */

I guess you mean cascading as "replicating logically from a physical
standby"? I don't think it's good to use cascading here, because that's
usually thought to mean doing physical SR from a standby...



> diff --git a/src/backend/replication/logical/logicalfuncs.c 
> b/src/backend/replication/logical/logicalfuncs.c
> index 318726e..4315fb3 100644
> --- a/src/backend/replication/logical/logicalfuncs.c
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -234,12 +234,6 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo 
> fcinfo, bool confirm, bool bin
>       rsinfo->setResult = p->tupstore;
>       rsinfo->setDesc = p->tupdesc;
>  
> -     /* compute the current end-of-wal */
> -     if (!RecoveryInProgress())
> -             end_of_wal = GetFlushRecPtr();
> -     else
> -             end_of_wal = GetXLogReplayRecPtr(NULL);
> -
>       ReplicationSlotAcquire(NameStr(*name));
>  
>       PG_TRY();
> @@ -279,6 +273,12 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo 
> fcinfo, bool confirm, bool bin
>               /* invalidate non-timetravel entries */
>               InvalidateSystemCaches();
>  
> +             if (!RecoveryInProgress())
> +                     end_of_wal = GetFlushRecPtr();
> +             else
> +                     end_of_wal = GetXLogReplayRecPtr(NULL);
> +
> +             /* Decode until we run out of records */
>               while ((startptr != InvalidXLogRecPtr && startptr < end_of_wal) 
> ||
>                          (ctx->reader->EndRecPtr != InvalidXLogRecPtr && 
> ctx->reader->EndRecPtr < end_of_wal))
>               {

That seems like a pretty random change?


> diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
> index deaa7f5..caff9a6 100644
> --- a/src/include/access/xlogreader.h
> +++ b/src/include/access/xlogreader.h
> @@ -27,6 +27,10 @@
>  
>  #include "access/xlogrecord.h"
>  
> +#ifndef FRONTEND
> +#include "nodes/pg_list.h"
> +#endif

Why is that needed?
> diff --git a/src/test/modules/test_slot_timelines/README 
> b/src/test/modules/test_slot_timelines/README
> new file mode 100644
> index 0000000..585f02f
> --- /dev/null
> +++ b/src/test/modules/test_slot_timelines/README
> @@ -0,0 +1,19 @@
> +A test module for logical decoding failover and timeline following.
> +
> +This module provides a minimal way to maintain logical slots on replicas
> +that mirror the state on the master. It doesn't make decoding possible,
> +just tracking slot state so that a decoding client that's using the master
> +can follow a physical failover to the standby. The master doesn't know
> +about the slots on the standby, they're synced by a client that connects
> +to both.
> +
> +This is intentionally not part of the test_decoding module because that's 
> meant
> +to serve as example code, where this module exercises internal server 
> features
> +by unsafely exposing internal state to SQL. It's not the right way to do
> +failover, it's just a simple way to test it from the perl TAP framework to
> +prove the feature works.
> +
> +In a practical implementation of this approach a bgworker on the master would
> +monitor slot positions and relay them to a bgworker on the standby that 
> applies
> +the position updates without exposing slot internals to SQL. That's too 
> complex
> +for this test framework though.

I still don't want to have this code in postgres, it's just an example
for doing something bad.  Lets get proper feedback control in, and go
from there.

> diff --git 
> a/src/test/modules/test_slot_timelines/expected/load_extension_1.out 
> b/src/test/modules/test_slot_timelines/expected/load_extension_1.out
> new file mode 100644
> index 0000000..0db21e4
> --- /dev/null
> +++ b/src/test/modules/test_slot_timelines/expected/load_extension_1.out
> @@ -0,0 +1,7 @@
> +CREATE EXTENSION test_slot_timelines;
> +SELECT test_slot_timelines_create_logical_slot('test_slot', 'test_decoding');
> +ERROR:  replication slots can only be used if max_replication_slots > 0
> +SELECT test_slot_timelines_advance_logical_slot('test_slot', 
> txid_current()::text::xid, txid_current()::text::xid, 
> pg_current_xlog_location(), pg_current_xlog_location());
> +ERROR:  replication slots can only be used if max_replication_slots > 0
> +SELECT pg_drop_replication_slot('test_slot');
> +ERROR:  replication slots can only be used if max_replication_slots > 0
> diff --git a/src/test/modules/test_slot_timelines/sql/load_extension.sql 
> b/src/test/modules/test_slot_timelines/sql/load_extension.sql
> new file mode 100644
> index 0000000..2440355

It doesn't seem worthwhlie to maintain a test for this - why do we run
the tests with those settings in the first place?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Logical decoding timeline following take II

Reply via email to