Re: [HACKERS] Inconsistent DB data in Streaming Replication

Florian Pflug Mon, 15 Apr 2013 00:32:39 -0700

On Apr14, 2013, at 17:56 , Fujii Masao <[email protected]> wrote:
> At fast shutdown, after walsender sends the checkpoint record and
> closes the replication connection, walreceiver can detect the close
> of connection before receiving all WAL records. This means that,
> even if walsender sends all WAL records, walreceiver cannot always
> receive all of them.


That sounds like a bug in walreceiver to me.

The following code in walreceiver's main loop looks suspicious:

  /*
   * Process the received data, and any subsequent data we
   * can read without blocking.
   */
  for (;;)
  {
    if (len > 0)
    {
      /* Something was received from master, so reset timeout */
      ...
      XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
    }
    else if (len == 0)
      break;
    else if (len < 0)
    {
      ereport(LOG,
          (errmsg("replication terminated by primary server"),
           errdetail("End of WAL reached on timeline %u at %X/%X",
                 startpointTLI,
                 (uint32) (LogstreamResult.Write >> 32),
                 (uint32) LogstreamResult.Write)));
      ...
    }
    len = walrcv_receive(0, &buf);
  }

  /* Let the master know that we received some data. */
  XLogWalRcvSendReply(false, false);

  /*
   * If we've written some records, flush them to disk and
   * let the startup process and primary server know about
   * them.
   */   
  XLogWalRcvFlush(false);

The loop at the top looks fine - it specifically avoids throwing
an error on EOF. But the code then proceeds to XLogWalRcvSendReply()
which doesn't seem to have the same smarts - it simply does

  if (PQputCopyData(streamConn, buffer, nbytes) <= 0 ||  
      PQflush(streamConn))
      ereport(ERROR,
              (errmsg("could not send data to WAL stream: %s",
                      PQerrorMessage(streamConn))));

Unless I'm missing something, that certainly seems to explain
how a standby can lag behind even after a controlled shutdown of
the master.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistent DB data in Streaming Replication

Reply via email to