[GENERAL] Re: Is there possibility btree_redo with XLOG_BTREE_DELETE done between standby_redo and the end of backup

2017-05-25 Thread y39chen
Yeah, I figured out the point(logic). The precondition is should not have any
connections accept while recovering. It is clear to me now. Thank you very
much.

static TransactionId
btree_xlog_delete_get_latestRemovedXid(xl_btree_delete *xlrec)
{
..
if (*CountDBBackends(InvalidOid)* == 0)
return latestRemovedXid;

/*
 * In what follows, we have to examine the previous state of the index
 * page, as well as the heap page(s) it points to.  This is only valid 
if
 * WAL replay has reached a consistent database state; which means that
 * the preceding check is not just an optimization, but is *necessary*. 
We
 * won't have let in any user sessions before we reach consistency.
 */
if (!reachedConsistency)
elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot
operate with inconsistent data");

..
}



--
View this message in context: 
http://www.postgresql-archive.org/Is-there-possibility-btree-redo-with-XLOG-BTREE-DELETE-done-between-standby-redo-and-the-end-of-backp-tp5963066p5963349.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Is there possibility btree_redo with XLOG_BTREE_DELETE done between standby_redo and the end of backup

2017-05-25 Thread Tom Lane
y39chen  writes:
> We found the panic happened when adding one of our patch.

>   switch (port->canAcceptConnections)
>   {
>   case CAC_STARTUP:
>   ereport(*LOG*,
>   (errcode(ERRCODE_CANNOT_CONNECT_NOW),
>errmsg("the database system is 
> starting up")));
>   break;

Did you not absorb the advice given in

https://www.postgresql.org/message-id/23381.1494607...@sss.pgh.pa.us

?  The above patch is simply an impossibly bad idea.  Don't do it.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Is there possibility btree_redo with XLOG_BTREE_DELETE done between standby_redo and the end of backup

2017-05-25 Thread Michael Paquier
On Thu, May 25, 2017 at 5:23 AM, y39chen  wrote:
> My doubt is Standby is redoing the records in WAL from master. how accept
> connection in standby side while recovering would trigger
> btree_xlog_delete_get_latestRemovedXid() and panic happen.

You should look at the relationship between the code of postmaster.c
dealing with updates of pmState and how the startup process (xlog.c)
lets the postmaster know when it can accept incoming connections. Once
a set of conditions is met, the startup process will let the
postmaster know if it is safe to accept connections on a hot standby.
That's a good study and the code is well-commented, so I let you guess
what are those conditions and how they are met during recovery.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: Is there possibility btree_redo with XLOG_BTREE_DELETE done between standby_redo and the end of backup

2017-05-25 Thread y39chen
Thank you the comments. 
We found the panic happened when adding one of our patch.

static int
ProcessStartupPacket(Port *port, bool SSLdone)
{
..
/*
 * If we're going to reject the connection due to database state, say so
 * now instead of wasting cycles on an authentication exchange. (This 
also
 * allows a pg_ping utility to be written.)
 */
switch (port->canAcceptConnections)
{
case CAC_STARTUP:
ereport(*LOG*,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 errmsg("the database system is 
starting up")));
break;
..
}

I understand the patch would accept connection while Postgres master is
still recovering. and it is dangerous in general.  When I remove the
patch(be PANIC still). no panic happened. It seems have relationship.
I also noticed there is one post for similar problem discussion.

http://www.postgresql-archive.org/Re-Crash-observed-during-the-start-of-the-Postgres-process-td5958225.html

  

My doubt is Standby is redoing the records in WAL from master. how accept
connection in standby side while recovering would trigger
btree_xlog_delete_get_latestRemovedXid() and panic happen. 

I tried to read the postgres code the get the clear map. But still confused.

Are you aware and would you kindly explain the relationship behind? Or give
me some hint how to investigate it? 
 



--
View this message in context: 
http://www.postgresql-archive.org/Is-there-possibility-btree-redo-with-XLOG-BTREE-DELETE-done-between-standby-redo-and-the-end-of-backp-tp5963066p5963181.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general