Re: [HACKERS] Switching timeline over streaming replication

Amit kapila Sun, 30 Sep 2012 19:26:10 -0700

> On Friday, September 28, 2012 6:38 PM Amit Kapila wrote:
> On Tuesday, September 25, 2012 6:29 PM Heikki Linnakangas wrote:
> On 25.09.2012 10:08, Heikki Linnakangas wrote:
> > On 24.09.2012 16:33, Amit Kapila wrote:
> >> In any case, it will be better if you can split it into multiple
> patches:
> >> 1. Having new functionality of "Switching timeline over streaming
> >> replication"
> >> 2. Refactoring related changes.



> Ok, here you go. xlog-c-split-1.patch contains the refactoring of existing
code, with no user-visible changes.
> streaming-tli-switch-2.patch applies over xlog-c-split-1.patch, and
contains the new functionality.


> Please find the initial review of the patch. Still more review is pending,
> but I thought whatever is done I shall post

Some more review:

11. In function readTimeLineHistory()
    ereport(DEBUG3,
(errmsg_internal("history of timeline %u is %s",
targetTLI, nodeToString(result))));


    Don't nodeToString(result) needs to be changed as it contain list of 
structure TimeLineHistoryEntry


12. In function @@ -3768,6 +3773,8 @@ rescanLatestTimeLine(void)
+ * The current timeline must be found in the history file, and the
+ * next timeline must've forked off from it *after* the current
+ * recovery location.
  */
- if (!list_member_int(newExpectedTLIs,
- (int) recoveryTargetTLI))
- ereport(LOG,
- (errmsg("new timeline %u is not a child of database system timeline %u",
- newtarget,
- ThisTimeLineID)));


    is there any logic in the current patch which ensures that above check is 
not require now?


13. In function @@ -3768,6 +3773,8 @@ rescanLatestTimeLine(void)
+ found = false;
+ foreach (cell, newExpectedTLIs)
..
..
- list_free(expectedTLIs);
+ list_free_deep(expectedTLIs);
    whats the use of the found variable and freeing expectedTLIs in loop might 
cause problem.


14. In function @@ -3768,6 +3773,8 @@ rescanLatestTimeLine(void)
newExpectedTLIs = readTimeLineHistory(newtarget);
Shouldn't this variable be declared as newExpectedTLEs as the list returned by 
readTimeLineHistory contains target list entry


15. StartupXLOG
/* Now we can determine the list of expected TLIs */
expectedTLIs = readTimeLineHistory(recoveryTargetTLI);


Should expectedTLIs be changed to expectedTLEs as the list returned by 
readTimeLineHistory contains target list entry


16.@@ -5254,8 +5252,8 @@ StartupXLOG(void)
writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
- curFileTLI, endLogSegNo, reason);
+ curFileTLI, EndRecPtr, reason);
if endLogSegNo is not used here, it needs to be removd from function 
declaration as well.


17.@@ -5254,8 +5252,8 @@ StartupXLOG(void)
if (InArchiveRecovery)
      ..
      ..
+
+ /* The history file can be archived immediately. */
+ TLHistoryFileName(histfname, ThisTimeLineID);
+ XLogArchiveNotify(histfname);


     Shouldn't this be done archive_mode is on. Right now InArchiveRecovery is 
true even when we do recovery for standby


18. +static bool
+WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool 
fetching_ckpt)
{
..
+ if (XLByteLT(RecPtr, receivedUpto))
+ havedata = true;
+ else
+ {
+ XLogRecPtr latestChunkStart;
+
+ receivedUpto = GetWalRcvWriteRecPtr(&latestChunkStart, &receiveTLI);
+ if (XLByteLT(RecPtr, receivedUpto) && receiveTLI == curFileTLI)
+ {
+ havedata = true;
+ if (!XLByteLT(RecPtr, latestChunkStart))
+ {
+ XLogReceiptTime = GetCurrentTimestamp();
+ SetCurrentChunkStartTime(XLogReceiptTime);
+ }
+ }
+ else
+ havedata = false;
+ }


In the above logic, it seems there is inconsistency in setting havedata = true;
In the above code in else loop, let us say cond. receiveTLI == curFileTLI is 
false but XLByteLT(RecPtr, receivedUpto) is true,
then in next round in for loop, the check if (XLByteLT(RecPtr, receivedUpto)) 
will get true and will set havedata = true;
which seems to be contradictory.


19.


+static bool
+WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool 
fetching_ckpt)
{
..
+ if (PrimaryConnInfo)
+ {
+ XLogRecPtr ptr = fetching_ckpt ? RedoStartLSN : RecPtr;
+ TimeLineID tli = timelineOfPointInHistory(ptr, expectedTLIs);
+
+ if (tli < curFileTLI)


I think in some cases if (tli < curFileTLI) might not make sense, as for case 
where curFileTLI =0 for randAccess. 




20. Function name WaitForWALToBecomeAvailable suggests that it waits for WAL, 
but it also returns true when trigger file is present,
    which can be little misleading.




21. @@ -2411,27 +2411,6 @@ reaper(SIGNAL_ARGS)

    a. won't it impact stop of online basebackup functionality because earlier 
on promotion
       I think this code will stop walsenders and basebackup stop will also 
give error in such cases.






22. @@ -63,10 +66,17 @@ void
 _PG_init(void)
 {
  /* Tell walreceiver how to reach us */
- if (walrcv_connect != NULL || walrcv_receive != NULL ||
- walrcv_send != NULL || walrcv_disconnect != NULL)
+ if (walrcv_connect != NULL || walrcv_identify_system ||
+ walrcv_readtimelinehistoryfile != NULL ||


   check for walrcv_identify_system != NULL is missing.


23. write the function header for newly added functions 
(libpqrcv_startstreaming, libpqrcv_identify_system, ..)


24. In header of function libpqrcv_receive(), *type needs to be removed.
 + * If data was received, returns the length of the data. *type and *buffer
    

25. 
+timeline_history:
+ K_TIMELINE_HISTORY ICONST
+ {
+ TimeLineHistoryCmd *cmd;
+
+ cmd = makeNode(TimeLineHistoryCmd);
+ cmd->timeline = $2;


can handle invalid timeline error same as for opt_timeline


26.@@ -170,6 +187,7 @@ WalReceiverMain(void)
+ case WALRCV_WAITING:
+ case WALRCV_STREAMING:
  /* Shouldn't happen */
  elog(PANIC, "walreceiver still running according to shared memory state");
elog message should be changed according to new states.


27.@@ -259,8 +281,11 @@ WalReceiverMain(void)
 
  /* Load the libpq-specific functions */
  load_file("libpqwalreceiver", false);
- if (walrcv_connect == NULL || walrcv_receive == NULL ||
- walrcv_send == NULL || walrcv_disconnect == NULL)
+ if (walrcv_connect == NULL || walrcv_startstreaming == NULL ||
+ walrcv_endstreaming == NULL ||
+ walrcv_readtimelinehistoryfile == NULL ||
+ walrcv_receive == NULL || walrcv_send == NULL ||
+ walrcv_disconnect == NULL)


check for walrcv_identify_system is missing.


28.
+/*
+ * Check that we're connected to a valid server using the IDENTIFY_SERVER
+ * replication command, and fetch any timeline history files present in the
+ * master but missing from this server's pg_xlog directory.
+ */
+static void
+WalRcvHandShake(TimeLineID startpointTLI)


In the function header the command name should be IDENTIFY_SYSTEM instead of 
IDENTIFY_SERVER


29. @@ -170,6 +187,7 @@ WalReceiverMain(void)
+ * timeline. In case we've already reached the end of the old timeline,
+ * the server will finish the streaming immediately, and we will
+ * disconnect. If recovery_target_timeline is 'latest', the startup
+ * process will then pg_xlog and find the new history file, bump recovery


a. I think after reaching at end of old timeline rather than discoonect it will 
start from new timeline, 
   which will be updated by startup process.
b. The above line seems to be incorrect, "will then (scan/search) pg_xlog"


30. +static void
+WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)


/*
+ * nudge startup process to notice that we've stopped streaming and are
+ * now waiting for instructions.
+ */
+ WakeupRecovery();
  for (;;)
  {


In this for loop don't we need to check interrupts or postmaster alive or 
recovery in progress
so that if any other process continues, it should not wait indefinately.


31.+static void
+WalRcvWaitForStartPosition(XLogRecPtr *startpoint, TimeLineID *startpointTLI)


/*
+ * nudge startup process to notice that we've stopped streaming and are
+ * now waiting for instructions.
+ */
+ WakeupRecovery();
  for (;;)
  {


+ SpinLockAcquire(&walrcv->mutex);
+ if (walrcv->walRcvState == WALRCV_STARTING)
  {


I think it can reach WALRCV_STOPPING state also after WALRCV_WAITING from 
shutdown,
so we should check for that state as well.


32.@@ -170,6 +187,7 @@ WalReceiverMain(void)
{
..
..


+ elog(LOG, "walreceiver ended streaming and awaits new instructions");
+ WalRcvWaitForStartPosition(&startpoint, &startpointTLI);


a. After getting new startpoint and tli, it will go again for 
WalRcvHandShake(startpointTLI);
   so here chances are there, it will again fetch the history files from server 
which we have 
   already fetched.
b. Also Identify_system command will run again and get the information such as 
system identifier,
   which is completely redundant at this point. 
   It fetches tli from primary also which I think can't be changed from what 
earlier it has fetched.


33. .@@ -170,6 +187,7 @@ WalReceiverMain(void)
+ for (;;)
+ {
+ if (len > 0)
+ XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
+ else if (len == 0)
+ break;
+ else if (len < 0)
+ {
+ ereport(LOG,
+ (errmsg("replication terminated by primary server"),
+ errdetail("End of WAL reached on timeline %u", startpointTLI)));
+ endofwal = true;
+ break;
+ }
+ len = walrcv_receive(0, &buf);
+ }
+
+ /* Let the master know that we received some data. */
+ XLogWalRcvSendReply();
+
+ /*
+ * If we've written some records, flush them to disk and let the
+ * startup process and primary server know about them.
+ */
+ XLogWalRcvFlush(false);


a. In the above code in for loop, when it breaks due to len < 0, there is no 
need to send reply to master.
b. also when it breaks due to len < 0, there can be 2 reasons, one is end of 
copy mode or primary server has
   disconnected. I think in second case handling should be same as what without 
this feature.
   Not sure if its eventually turning out to be same.


34.
+bool
+WalRcvStreaming(void)
+{


In this function, right now if state is WALRCV_WAITING, then it will return 
false.
I think waiting is better than starting for the matter of checking if 
walreceiver is in progress.
or is state WALRCV_WAITING anytime expected when this function is called, if 
not then we log the
error for invalid state.



With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Switching timeline over streaming replication

Reply via email to