Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-02-03 Thread Andres Freund
On 2015-02-03 14:18:02 +0900, Michael Paquier wrote:
 -  RecoveryLockList contains entry for lock no longer recorded by
 lock manager: xid %u database %u relation %u,
 -  lock-xid, lock-dbOid, lock-relOid);
 +RecoveryLockList contains entry for lock no longer recorded by
 lock manager: xid %u,
 + lock-xid);
 This patch is making the information provided less verbose, and I
 think that it is useful to have some information not only about the
 lock held, but as well about the database and the relation.

It's debug4 or impossible stuff that lock.c already warned about - I
doubt anybody has ever actually looked at it in a long while, if
ever. If we really want to provide something more we can use something
like LOCK_PRINT() - but I really doubt it's worth neither the
notational, nor the verbosity overhead.

 Also, ISTM that StandbyAcquireLock should still use a database OID and
 a relation OID instead of a only LOCKTAG, and SET_LOCKTAG_RELATION
 should be set in StandbyAcquireLock while
 ResolveRecoveryConflictWithLock is extended only with the lock mode as
 new argument. (Patch 2 adds many calls to SET_LOCKTAG_RELATION btw
 justidying to keep he API changes minimal).

But there's now callers acquiring other locks than relation locks, like
dbase_redo() acquiring a object lock. And we need to acquire those via
the standby mechanism to avoid races around release.  We could add a
separate wrapper for relation locks, but imo the locktag move to the
callers saved about as many lines in some places as it cost in others.

 In patch 2, isn't it necessary to bump XLOG_PAGE_MAGIC?

I don't think so, there's no incompatible change.

Thanks for having a look!

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-02-02 Thread Michael Paquier
On Sat, Jan 31, 2015 at 5:34 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-29 11:01:51 -0500, Robert Haas wrote:
 On Wed, Jan 28, 2015 at 2:41 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Andres Freund wrote:
  I think this isn't particularly pretty, but it seems to be working well
  enough, and changing it would be pretty invasive. So keeping in line
  with all that code seems to be easier.
  OK, I'm convinced with this part to remove the call of
  LockSharedObjectForSession that uses dontWait and replace it by a loop
  in ResolveRecoveryConflictWithDatabase.

 That seems right to me, too.

 It's slightly more complicated than that. The lock conflict should
 actually be resolved using ResolveRecoveryConflictWithLock()... That,
 combined with the race of connecting a actually already deleted database
 (see the XXXs I removed) seem to make the approach in here.

 Attached are two patches:
 1) Infrastructure for attaching more kinds of locks on the startup
process.
 2) Use that infrastructure for database locks during replay.

 I'm not sure 2) alone would be sufficient justification for 1), but the
 nearby thread about basebackups also require similar infrastructure...

Some comments about patch 1:
-* No locking is required here because we already acquired
-* AccessExclusiveLock. Anybody trying to connect while we do this will
-* block during InitPostgres() and then disconnect when they see the
-* database has been removed.
+* No locking is required here because we already acquired a
+* AccessExclusiveLock on the database in dbase_redo().
Anybody trying to
+* connect while we do this will block during InitPostgres() and then
+* disconnect when they see the database has been removed.
 */
This change looks unnecessary, I'd rather let it as-is.

-  RecoveryLockList contains entry for lock no longer recorded by
lock manager: xid %u database %u relation %u,
-  lock-xid, lock-dbOid, lock-relOid);
+RecoveryLockList contains entry for lock no longer recorded by
lock manager: xid %u,
+ lock-xid);
This patch is making the information provided less verbose, and I
think that it is useful to have some information not only about the
lock held, but as well about the database and the relation.
Also, ISTM that StandbyAcquireLock should still use a database OID and
a relation OID instead of a only LOCKTAG, and SET_LOCKTAG_RELATION
should be set in StandbyAcquireLock while
ResolveRecoveryConflictWithLock is extended only with the lock mode as
new argument. (Patch 2 adds many calls to SET_LOCKTAG_RELATION btw
justidying to keep he API changes minimal).

There are some typos in the commit message:
s/shanges/changes
s/exlusive/exclusive

In patch 2, isn't it necessary to bump XLOG_PAGE_MAGIC?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-01-30 Thread Andres Freund
On 2015-01-29 11:01:51 -0500, Robert Haas wrote:
 On Wed, Jan 28, 2015 at 2:41 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Andres Freund wrote:
  I think this isn't particularly pretty, but it seems to be working well
  enough, and changing it would be pretty invasive. So keeping in line
  with all that code seems to be easier.
  OK, I'm convinced with this part to remove the call of
  LockSharedObjectForSession that uses dontWait and replace it by a loop
  in ResolveRecoveryConflictWithDatabase.
 
 That seems right to me, too.

It's slightly more complicated than that. The lock conflict should
actually be resolved using ResolveRecoveryConflictWithLock()... That,
combined with the race of connecting a actually already deleted database
(see the XXXs I removed) seem to make the approach in here.

Attached are two patches:
1) Infrastructure for attaching more kinds of locks on the startup
   process.
2) Use that infrastructure for database locks during replay.

I'm not sure 2) alone would be sufficient justification for 1), but the
nearby thread about basebackups also require similar infrastructure...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 8531ca4d200d24ee45265774f7ead613563adca4 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 30 Jan 2015 09:28:17 +0100
Subject: [PATCH 1/4] Allow recovery lock infrastructure to not only hold
 relation locks.

This is a preparatory commit to fix several bugs, separated out for
easier review.

The startup process could so far only properly acquire access exlusive
locks on transactions. As it does not setup enough state to do lock
queuing - primarily to be able to issue recovery conflict interrupts,
and to release them when the primary releases locks - it has it's own
infrastructure to manage locks. That infrastructure so far assumed
that all locks are access exlusive locks on relations.

Unfortunately some code in the startup process has to acquire other
locks than what's supported by the aforementioned infrastructure in
standby.c. Namely dbase_redo() has to acquire locks on the database
objects.  Also further such locks will be added soon, to fix a another
bug.

So this patch shanges the infrastructure to be able to acquire locks
of different modes and locktags.

Additionally allow acquiring more heavyweight relation logs on the
standby than RowExclusive when acquired in session mode.

Discussion: 20150120152819.gc24...@alap3.anarazel.de

Backpatch all the way.
---
 src/backend/storage/ipc/standby.c | 120 ++
 src/backend/storage/lmgr/lock.c   |   7 +--
 src/include/storage/standby.h |   2 +-
 3 files changed, 61 insertions(+), 68 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 292bed5..0502aab 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -38,10 +38,16 @@ int			max_standby_archive_delay = 30 * 1000;
 int			max_standby_streaming_delay = 30 * 1000;
 
 static List *RecoveryLockList;
+typedef struct RecoveryLockListEntry
+{
+	TransactionId	xid;
+	LOCKMODE		lockmode;
+	LOCKTAG			locktag;
+} RecoveryLockListEntry;
 
 static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId *waitlist,
 	   ProcSignalReason reason);
-static void ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid);
+static void ResolveRecoveryConflictWithLock(LOCKTAG *tag, LOCKMODE mode);
 static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
 static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
 static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
@@ -320,10 +326,10 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 	 * us. This is rare enough that we do this as simply as possible: no wait,
 	 * just force them off immediately.
 	 *
-	 * No locking is required here because we already acquired
-	 * AccessExclusiveLock. Anybody trying to connect while we do this will
-	 * block during InitPostgres() and then disconnect when they see the
-	 * database has been removed.
+	 * No locking is required here because we already acquired a
+	 * AccessExclusiveLock on the database in dbase_redo(). Anybody trying to
+	 * connect while we do this will block during InitPostgres() and then
+	 * disconnect when they see the database has been removed.
 	 */
 	while (CountDBBackends(dbid)  0)
 	{
@@ -338,14 +344,11 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 }
 
 static void
-ResolveRecoveryConflictWithLock(Oid dbOid, Oid relOid)
+ResolveRecoveryConflictWithLock(LOCKTAG *locktag, LOCKMODE mode)
 {
 	VirtualTransactionId *backends;
 	bool		lock_acquired = false;
 	int			num_attempts = 0;
-	LOCKTAG		locktag;
-
-	SET_LOCKTAG_RELATION(locktag, dbOid, relOid);
 
 	/*
 	 * If blowing away everybody with conflicting locks doesn't work, after
@@ -358,7 +361,7 

Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-01-29 Thread Robert Haas
On Wed, Jan 28, 2015 at 2:41 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Andres Freund wrote:
 I think this isn't particularly pretty, but it seems to be working well
 enough, and changing it would be pretty invasive. So keeping in line
 with all that code seems to be easier.
 OK, I'm convinced with this part to remove the call of
 LockSharedObjectForSession that uses dontWait and replace it by a loop
 in ResolveRecoveryConflictWithDatabase.

That seems right to me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-01-27 Thread Andres Freund
On 2015-01-27 16:23:53 +0900, Michael Paquier wrote:
 On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund and...@2ndquadrant.com wrote:
  Unfortunately that Assert()s when there's a lock conflict because
  e.g. another backend is currently connecting. That's because ProcSleep()
  does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
  there's no deadlock timeout (or lock timeout) handler registered.

 Yes, that could logically happen if there is a lock conflicting as
 RowExclusiveLock or lower lock can be taken in recovery.

I don't this specific lock (it's a object, not relation lock) can easily
be taken directly by a user except during authentication.

  [...]
  afaics, that should work? Not pretty, but probably easier than starting
  to reason about the deadlock detector in the startup process.

 Wouldn't it be cleaner to simply register a dedicated handler in
 StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as
 it is reserved for backend operations? For back-branches, we may even
 consider using DEADLOCK_TIMEOUT..

What would that timeout handler actually do? Two problems:
a) We really don't want the startup process be killed/error out as that
   likely means a postmaster restart. So the default deadlock detector
   strategy is something that's really useless for us.
b) Pretty much all other (if not actually all other) heavyweight lock
   acquisitions in the startup process acquire locks using dontWait =
   true - which means that the deadlock detector isn't actually
   run. That's essentially fine because we simply kill everything in our
   way. C.f. StandbyAcquireAccessExclusiveLock() et al.

There's a dedicated 'deadlock detector' like infrastructure around
ResolveRecoveryConflictWithBufferPin(), but it deals with a class of
deadlocks that's not handled in the deadlock detector anyway.


I think this isn't particularly pretty, but it seems to be working well
enough, and changing it would be pretty invasive. So keeping in line
with all that code seems to be easier.

  We probably should also add a Assert(!InRecovery || sessionLock) to
  LockAcquireExtended() - these kind of problems are otherwise hard to
  find in a developer setting.

 So this means that locks other than session ones cannot be taken while
 a node is in recovery, but RowExclusiveLock can be taken while in
 recovery. Don't we have a problem with this assertion then?

Note that InRecovery doesn't mean what you probably think it means:
/*
 * Are we doing recovery from XLOG?
 *
 * This is only ever true in the startup process; it should be read as meaning
 * this process is replaying WAL records, rather than the system is in
 * recovery mode.  It should be examined primarily by functions that need
 * to act differently when called from a WAL redo function (e.g., to skip WAL
 * logging).  To check whether the system is in recovery regardless of which
 * process you're running in, use RecoveryInProgress() but only after shared
 * memory startup and lock initialization.
 */
boolInRecovery = false;

The assertion actually should be even stricter:
Assert(InRecovery || (sessionLock  dontWait));
i.e. we never acquire a heavyweight lock in the startup process unless
it's a session lock (as we don't have resource managers/a xact to track
locks) and we don't wait (as we don't have the deadlock detector
infrastructure set up).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-01-27 Thread Michael Paquier
Andres Freund wrote:
 I think this isn't particularly pretty, but it seems to be working well
 enough, and changing it would be pretty invasive. So keeping in line
 with all that code seems to be easier.
OK, I'm convinced with this part to remove the call of
LockSharedObjectForSession that uses dontWait and replace it by a loop
in ResolveRecoveryConflictWithDatabase.

 Note that InRecovery doesn't mean what you probably think it means:
 [stuff]
 boolInRecovery = false;
Yes, right. I misunderstood with RecoveryInProgress().

 The assertion actually should be even stricter:
 Assert(!InRecovery || (sessionLock  dontWait));
 i.e. we never acquire a heavyweight lock in the startup process unless
 it's a session lock (as we don't have resource managers/a xact to track
 locks) and we don't wait (as we don't have the deadlock detector
 infrastructure set up).
No problems with this assertion here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-01-26 Thread Michael Paquier
On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 Unfortunately that Assert()s when there's a lock conflict because
 e.g. another backend is currently connecting. That's because ProcSleep()
 does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
 there's no deadlock timeout (or lock timeout) handler registered.
Yes, that could logically happen if there is a lock conflicting as
RowExclusiveLock or lower lock can be taken in recovery.

 [...]
 afaics, that should work? Not pretty, but probably easier than starting
 to reason about the deadlock detector in the startup process.
Wouldn't it be cleaner to simply register a dedicated handler in
StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as
it is reserved for backend operations? For back-branches, we may even
consider using DEADLOCK_TIMEOUT..

 We probably should also add a Assert(!InRecovery || sessionLock) to
 LockAcquireExtended() - these kind of problems are otherwise hard to
 find in a developer setting.
So this means that locks other than session ones cannot be taken while
a node is in recovery, but RowExclusiveLock can be taken while in
recovery. Don't we have a problem with this assertion then?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-07 Thread Kyotaro HORIGUCHI
Hello,

  After all, I have confirmed that this fixes the problem on crash
  recovery of hot-standby botfor 9.3 and HEAD and no problem was
  found except unreadability :(
 
 Ok, committed. Thanks!

Thank you.

 Any concrete suggestions about the readability? Is there some
 particular spot that needs clarifying?

Well, Although I became to see no problem there after I understood how
it works:-), I'll write about two points where I had difficulties to
understand.

|  * (or the checkpoint record itself, if it's a shutdown checkpoint).
|  */
| if (checkPoint.redo  RecPtr)

First, it was a bit tough work to confirm the equivalence between
(redo == RecPtr) and that the checkpoint is shutdown checkpoint.
Although finally I was convinced that it surely holds, that is
actually not the point. The point here is in the first half of the
phrase. The comment might be less perplexing if it were as folowing
even if only shutdown checkpoint satisfies the condition. But it would
occur another quiestion in readers' mind.

|  * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint).

Second, the added code depends on the assumption that RecPtr points to
the checkpoint record and EndRecPtr points to the next record there.
It would be better for understandability and stability (against
modifying code) to explicitly declare the precondition, like this

| Here RecPtr points the checkpoint record and EndRecPtr points to the
| place for the record just after.


  Surely this is the consequence of illegal operation but I think
  it is also not a issue of assertion - which fires on something
  wrong in design or quite rare cases(this case ?).
 
 Ah, I see. Yes, that's definitely a bug. If you don't hit the
 assertion, because the oldestActiveXID is set in the checkpoint record
 even though wal_level is 'archive', or if you simply have assertions
 disabled, the system will start up in hot standby mode even though
 it's not safe.
 
  So it might be
  better to show message as below on the case.
 
  | FATAL:  Checkpoint doesn't have valid oldest active transaction id
  | HINT: Reading WAL might have been written under insufficient
  | wal_level.

Agreed.

 Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
 wal_level is 'archive'. So the above patch doesn't fix the whole
 problem.
 
 The real bug here is that CheckRequiredParameterValues() tests for
 InArchiveRecovery, when it should be testing for
 ArchiveRecoveryRequested. Otherwise, the checks are not performed when
 going through the crash recovery followed by archive recovery. I
 should've changed that as part of the commit that added the crash
 recovery then archive recovery behavior.
 
 Fixed, thanks for pointing it out!

It's my pleasre.

regrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-05 Thread Kyotaro HORIGUCHI
Hello, 

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.


The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: 
xlog.c, Line: 6799)
| LOG:  startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?). So it might be
better to show message as below on the case.

| FATAL:  Checkpoint doesn't have valid oldest active transaction id
| HINT:  Reading WAL might have been written under insufficient wal_level.

This could do in this way,

==
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
if (wasShutdown)
oldestActiveXID = 
PrescanPreparedTransactions(xids, nxids);
else
+   {
oldestActiveXID = checkPoint.oldestActiveXid;
+   if (!TransactionIdIsValid(oldestActiveXID))
+   ereport(FATAL,
+   (errmsg(Checkpoint 
doesn't have valid oldest active transaction id),
+errhint(Reading WAL 
might have been written under insufficient wal_level.)));
+   }
Assert(TransactionIdIsValid(oldestActiveXID));
 
/* Tell procarray about the range of xids it has to 
deal with */
=


What do you think about this? Feel free dumping this if you feel
negative on this.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-05 Thread Heikki Linnakangas

On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote:

Hello,

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(


Ok, committed. Thanks!

Any concrete suggestions about the readability? Is there some particular 
spot that needs clarifying?



By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.


The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: 
xlog.c, Line: 6799)
| LOG:  startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?).


Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, 
because the oldestActiveXID is set in the checkpoint record even though 
wal_level is 'archive', or if you simply have assertions disabled, the 
system will start up in hot standby mode even though it's not safe.



So it might be
better to show message as below on the case.

| FATAL:  Checkpoint doesn't have valid oldest active transaction id
| HINT:  Reading WAL might have been written under insufficient wal_level.

This could do in this way,
==
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
if (wasShutdown)
oldestActiveXID = 
PrescanPreparedTransactions(xids, nxids);
else
+   {
oldestActiveXID = checkPoint.oldestActiveXid;
+   if (!TransactionIdIsValid(oldestActiveXID))
+   ereport(FATAL,
+   (errmsg(Checkpoint doesn't 
have valid oldest active transaction id),
+errhint(Reading WAL might 
have been written under insufficient wal_level.)));
+   }
Assert(TransactionIdIsValid(oldestActiveXID));

/* Tell procarray about the range of xids it has to 
deal with */
=


What do you think about this? Feel free dumping this if you feel
negative on this.


Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though 
wal_level is 'archive'. So the above patch doesn't fix the whole problem.


The real bug here is that CheckRequiredParameterValues() tests for 
InArchiveRecovery, when it should be testing for 
ArchiveRecoveryRequested. Otherwise, the checks are not performed when 
going through the crash recovery followed by archive recovery. I 
should've changed that as part of the commit that added the crash 
recovery then archive recovery behavior.


Fixed, thanks for pointing it out!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas 
hlinnakan...@vmware.com wrote in 53108506.2010...@vmware.com
  Yes, but the same stuation could be made by restarting crashed
  secondary.
 
 Yeah.
 
  I have no idea about the scenario on whitch this behavior was regarded
  as
  undesirable but anyway I think that the secondry should start
  accepting
  client just after crash recovery is completed.
 
 Agreed, this is a bug.
 
 I don't think your patch is the right fix for this though. Setting
 minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr
 points to the end of the last read and replayed record. What's wrong
 in this case is lastReplayedEndRecptr. At the beginning of recovery,
 it's initialized to the REDO point, but with a shutdown checkpoint,
 that's not quite right. When starting from a shutdown checkpoint, REDO
 points to the beginning of the shutdown record, but we've already
 effectively replayed it. The next record we replay is the one after
 the checkpoint.

It's more reasonable. I felt uncelar about that but I forgot to
doubt the correctness of lastReplayedEndRecptr then, but surely
the shutdown record itself was effectively alredy replayed when
the recored is inseretd.

 To see that, I added some elog(LOG) calls:
 
 ~/pgsql.93stable$ bin/postmaster -D data
 LOG:  database system was shut down at 2014-02-28 14:06:18 EET
 LOG:  ReadCheckpointRecord: 0/16479C98
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 ^CLOG:  received fast shutdown request
 LOG:  aborting any active transactions
 LOG:  autovacuum launcher shutting down
 LOG:  shutting down
 LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
 checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid
 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in
 DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
 LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
 LOG:  database system is shut down
 ~/pgsql.93stable$ bin/postmaster -D data
 LOG:  database system was shut down at 2014-02-28 14:06:23 EET
 LOG:  ReadCheckpointRecord: 0/16479D00
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started
 Killed
 
 At this point, the last record is the shutdown checkpoint, beginning
 at 16479D00, and the server has been killed (immediate shutdown).
 
 ~/pgsql.93stable$ cp recovery.conf data/recovery.conf
 ~/pgsql.93stable$ bin/postmaster -D data
 LOG: database system was interrupted; last known up at 2014-02-28
 14:06:29 EET
 LOG:  entering standby mode
 LOG:  ReadCheckpointRecord: 0/16479D00
 LOG: database system was not properly shut down; automatic recovery in
 progress
 LOG:  record with zero length at 0/16479D68
 LOG:  reached end of WAL in pg_xlog, entering archive recovery
 LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
 FATAL: could not connect to the primary server: could not connect to
 server: Connection refused
 ...
 
 Recovery starts from the checkpoint record, but lastReplayedEndRecPtr
 is set to the *beginning* of the checkpoint record, even though the
 checkpoint record has already been effectively replayed, by the feat
 of starting recovery from it. EndRecPtr correctly points to the end of
 the checkpoint record. Because of the incorrect lastReplayedEndRecPtr
 value, the CheckRecoveryConsistency() call concludes that it's not
 consistent.

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

I agree that the fix is appropriate.

 I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Ouch! It brought another bug.

 I completely understood the behavior thanks to your detailed
 explanation. (And how to use log messages effectively :-)

Sorry, I just found that it's wrong, and found another problem
brought by your patch.

 I agree that the fix is appropriate.
 
  I believe the attached fix is the right way to fix this.
 
 It also worked for me. Thank you.

| * as if we had just replayed the record before the REDO location
| * (or the checkpoint record itself, if it's a shutdown checkpoint).

The test script following raises assertion failure. It's added
with 'non-shutdown' checkpoint' just before shutting down
immediately. Starting server aborts with the following message.

| LOG:  database system was not properly shut down; automatic recovery in 
progress
| TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), File: 
xlog.c, Line: 6771)
| LOG:  startup process (PID 28561) was terminated by signal 6: Aborted

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

===
#! /bin/sh

killall postgres
rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
psql postgres -c 'checkpoint'
pg_ctl stop -m i
cat  $PGDATA/recovery.conf EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port= user=repuser application_name=pm01 
keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat  $PGDATA/postgresql.conf EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Correcting one point of my last mail.

 Ouch! It brought another bug.

My patch also did.

regards,


  I completely understood the behavior thanks to your detailed
  explanation. (And how to use log messages effectively :-)
 
 Sorry, I just found that it's wrong, and found another problem
 brought by your patch.
 
  I agree that the fix is appropriate.
  
   I believe the attached fix is the right way to fix this.
  
  It also worked for me. Thank you.
 
 | * as if we had just replayed the record before the REDO location
 | * (or the checkpoint record itself, if it's a shutdown checkpoint).
 
 The test script following raises assertion failure. It's added
 with 'non-shutdown' checkpoint' just before shutting down
 immediately. Starting server aborts with the following message.
 
 | LOG:  database system was not properly shut down; automatic recovery in 
 progress
 | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), 
 File: xlog.c, Line: 6771)
 | LOG:  startup process (PID 28561) was terminated by signal 6: Aborted
 
 regards,
 
 -- 
 Kyotaro Horiguchi
 NTT Open Source Software Center
 
 ===
 #! /bin/sh
 
 killall postgres
 rm -rf $PGDATA/*
 initdb
 pg_ctl start -w
 sleep 1
 psql postgres -c 'checkpoint'
 pg_ctl stop -m i
 cat  $PGDATA/recovery.conf EOF
 standby_mode = 'on'
 primary_conninfo = 'host=localhost port= user=repuser 
 application_name=pm01 keepalives_idle=60 keepalives_interval=5 
 keepalives_count=5'
 #restore_command = '/bin/true'
 recovery_target_timeline = 'latest'
 EOF
 cat  $PGDATA/postgresql.conf EOF
 #log_min_messages = debug5
 hot_standby = on
 EOF
 pg_ctl start
 

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-02 Thread Kyotaro HORIGUCHI
Hello,

 | * as if we had just replayed the record before the REDO location
 | * (or the checkpoint record itself, if it's a shutdown checkpoint).
 
 The test script following raises assertion failure. It's added
 with 'non-shutdown' checkpoint' just before shutting down
 immediately. Starting server aborts with the following message.
 
 | LOG:  database system was not properly shut down; automatic recovery in 
 progress
 | TRAP: FailedAssertion(!(((oldestActiveXID) != ((TransactionId) 0))), 
 File: xlog.c, Line: 6771)
 | LOG:  startup process (PID 28561) was terminated by signal 6: Aborted

This is because the checkpoint was done with 'wal_level =
minimal'. The server restarts correctly by starting the server
with 'wal_level = hot_standby' at first.

It looks a mere mis-op. The log message looks unfriendly but I'm
uncertain of the necessity of changing it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Kyotaro HORIGUCHI
Ouch. this is the same issue to the mail below,

http://www.postgresql.org/message-id/53104595.6060...@lab.ntt.co.jp

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Andres Freund
On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:
 The recovery process stays on 'incosistent' state forever when
 the server has crashed before any wal record is inserted after
 the last checkpoint.

 # killall postgres
 # rm -rf $PGDATA/*
 initdb
 pg_ctl start -w
 sleep 1
 pg_ctl stop -m i
 cat  $PGDATA/recovery.conf EOF
 standby_mode = 'on'
 primary_conninfo = 'host=localhost port= user=repuser 
 application_name=pm01 keepalives_idle=60 keepalives_interval=5 
 keepalives_count=5'
 #restore_command = '/bin/true'
 recovery_target_timeline = 'latest'
 EOF
 cat  $PGDATA/postgresql.conf EOF
 #log_min_messages = debug5
 hot_standby = on
 EOF
 pg_ctl start

Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?
Normally the primary will just do an additional write shortly
afterwards, resolving this situation?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Kyotaro HORIGUCHI
Hello,

2014/02/28 18:07 Andres Freund :

 On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:
  The recovery process stays on 'incosistent' state forever when
  the server has crashed before any wal record is inserted after
  the last checkpoint.

  # killall postgres
  # rm -rf $PGDATA/*
  initdb
  pg_ctl start -w
  sleep 1
  pg_ctl stop -m i
  cat  $PGDATA/recovery.conf EOF
  standby_mode = 'on'
  primary_conninfo = 'host=localhost port= user=repuser
application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'
  #restore_command = '/bin/true'
  recovery_target_timeline = 'latest'
  EOF
  cat  $PGDATA/postgresql.conf EOF
  #log_min_messages = debug5
  hot_standby = on
  EOF
  pg_ctl start

 Uh. So, if I understand correctly, what you did is to convert a normal
 live pg, into a replica that doesn't have a upstream node, right?

Yes, but the same stuation could be made by restarting crashed secondary. I
didn't tried it since my console is far behind now...

Can a restart of secondry after crashing just after the completion of
restartpoint with no further records make the same situation?

I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.

 Normally the primary will just do an additional write shortly
 afterwards, resolving this situation?

Maybe so. I haven't tried though.

regards,
-- 
Kyotaro Horiguchi
NTT Opensource Software Center


Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-02-28 Thread Heikki Linnakangas

On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote:

Hello,

2014/02/28 18:07 Andres Freund :


On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.



# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat  $PGDATA/recovery.conf EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port= user=repuser

application_name=pm01 keepalives_idle=60 keepalives_interval=5
keepalives_count=5'

#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat  $PGDATA/postgresql.conf EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start


Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?


Yes, but the same stuation could be made by restarting crashed secondary.


Yeah.


I have no idea about the scenario on whitch this behavior was regarded as
undesirable but anyway I think that the secondry should start accepting
client just after crash recovery is completed.


Agreed, this is a bug.

I don't think your patch is the right fix for this though. Setting 
minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points 
to the end of the last read and replayed record. What's wrong in this 
case is lastReplayedEndRecptr. At the beginning of recovery, it's 
initialized to the REDO point, but with a shutdown checkpoint, that's 
not quite right. When starting from a shutdown checkpoint, REDO points 
to the beginning of the shutdown record, but we've already effectively 
replayed it. The next record we replay is the one after the checkpoint.


To see that, I added some elog(LOG) calls:

~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:18 EET
LOG:  ReadCheckpointRecord: 0/16479C98
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG - 
checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393; 
oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest 
multi 1 in DB 1; oldest running xid 0; shutdown

LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
LOG:  database system is shut down
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:23 EET
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
Killed

At this point, the last record is the shutdown checkpoint, beginning at 
16479D00, and the server has been killed (immediate shutdown).


~/pgsql.93stable$ cp recovery.conf data/recovery.conf
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was interrupted; last known up at 2014-02-28 
14:06:29 EET

LOG:  entering standby mode
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system was not properly shut down; automatic recovery in 
progress

LOG:  record with zero length at 0/16479D68
LOG:  reached end of WAL in pg_xlog, entering archive recovery
LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
FATAL:  could not connect to the primary server: could not connect to 
server: Connection refused

...

Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is 
set to the *beginning* of the checkpoint record, even though the 
checkpoint record has already been effectively replayed, by the feat of 
starting recovery from it. EndRecPtr correctly points to the end of the 
checkpoint record. Because of the incorrect lastReplayedEndRecPtr value, 
the CheckRecoveryConsistency() call concludes that it's not consistent.


I believe the attached fix is the right way to fix this.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7399fd4..a36354f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5457,13 +5457,17 @@ StartupXLOG(void)
 
 		/*
 		 * Initialize shared variables for tracking progress of WAL replay,
-		 * as if we had just replayed the record before the REDO location.
+		 * as if we had just replayed the record before the REDO location
+		 * (or the checkpoint record itself, if it's a shutdown checkpoint).
 		 */
 		SpinLockAcquire(xlogctl-info_lck);
-		xlogctl-replayEndRecPtr = checkPoint.redo;
+		if (checkPoint.redo  RecPtr)
+			xlogctl-replayEndRecPtr = checkPoint.redo;
+		else
+			xlogctl-replayEndRecPtr = EndRecPtr;
 		xlogctl-replayEndTLI = ThisTimeLineID;
-		xlogctl-lastReplayedEndRecPtr = checkPoint.redo;
-		xlogctl-lastReplayedTLI = ThisTimeLineID;
+		xlogctl-lastReplayedEndRecPtr = xlogctl-replayEndRecPtr;
+		xlogctl-lastReplayedTLI = xlogctl-replayEndTLI;
 		

Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-17 Thread Andres Freund
On 2013-01-17 01:38:31 -0500, Tom Lane wrote:
 But having said that ... are we sure this code is not actually broken?
 ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
 cannot safely attempt to send an error message to the client either;
 but ereport(FATAL) will try exactly that.

You're absolutely right.

ISTM, to fix it we would have to either provide a COMERROR like facility
for FATAL errors or just remove the requirement of raising an error
exactly there.
If I remember what I tried before correctly the latter seems to involve
setting openssl into nonblocking mode and check for the saved error in
the EINTR loop in be-secure and re-raise the error from there.

Do we want to backport either - there hasn't been any report that I
could link to it right now, but on the other hand it could possibly
cause rather ugly problems (data leakage, segfaults, code execution
aren't all that improbable)?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-17 Thread Tom Lane
Pavan Deolasee pavan.deola...@gmail.com writes:
 On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
 cannot safely attempt to send an error message to the client either;
 but ereport(FATAL) will try exactly that.

 I thought since FATAL will force the backend to exit, we don't care much
 about corrupted OpenSSL state. I even thought that's why we raise ERROR to
 FATAL so that the backend can start in a clean state. But clearly I'm
 missing a point here because you don't think that way.

If we were to simply exit(1), leaving the kernel to close the client
socket, it'd be safe enough because control would never have returned to
OpenSSL.  But this code doesn't do that.  What we're looking at is that
we've interrupted OpenSSL at some arbitrary point, and now we're going
to make fresh calls to it to try to pump the FATAL error message out to
the client.  It seems fairly unlikely that that's safe.  I'm not sure
I credit Andres' worry of arbitrary code execution, but I do fear that
OpenSSL could get confused to the point of freezing up, or even more
likely that it would transmit garbage to the client, which rather
defeats the purpose.

Don't see a nice fix.  The COMMERROR approach (ie, don't try to send
anything to the client, only the log) is not nice at all since the
client would get the impression that the server crashed.  On the other
hand, anything else requires waiting till we get control back from
OpenSSL, which might be a long time, and meanwhile we're still holding
locks that prevent WAL recovery from proceeding.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-17 Thread Andres Freund
On 2013-01-17 10:19:23 -0500, Tom Lane wrote:
 Pavan Deolasee pavan.deola...@gmail.com writes:
  On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
  cannot safely attempt to send an error message to the client either;
  but ereport(FATAL) will try exactly that.
 
  I thought since FATAL will force the backend to exit, we don't care much
  about corrupted OpenSSL state. I even thought that's why we raise ERROR to
  FATAL so that the backend can start in a clean state. But clearly I'm
  missing a point here because you don't think that way.
 
 If we were to simply exit(1), leaving the kernel to close the client
 socket, it'd be safe enough because control would never have returned to
 OpenSSL.  But this code doesn't do that.  What we're looking at is that
 we've interrupted OpenSSL at some arbitrary point, and now we're going
 to make fresh calls to it to try to pump the FATAL error message out to
 the client.  It seems fairly unlikely that that's safe.  I'm not sure
 I credit Andres' worry of arbitrary code execution, but I do fear that
 OpenSSL could get confused to the point of freezing up, or even more
 likely that it would transmit garbage to the client, which rather
 defeats the purpose.

I don't think its likely either, I seem to remember it copying arround
function pointers though, so it seems possible with some bad luck.

 Don't see a nice fix.  The COMMERROR approach (ie, don't try to send
 anything to the client, only the log) is not nice at all since the
 client would get the impression that the server crashed.  On the other
 hand, anything else requires waiting till we get control back from
 OpenSSL, which might be a long time, and meanwhile we're still holding
 locks that prevent WAL recovery from proceeding.

I think we can make openssl return pretty much immediately if we assume
recv() can reliably interrupted by signals, possibly by setting the
socket to nonblocking in the signal handler.
We just need to tell openssl not to retry immediately and we should be
fine. Given that quite some people use openssl with nonblocking sockets,
that code path should be reasonably safe.

That still requires ugliness around saving the error and reraising it
after returning from openssl though...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Abhijit Menon-Sen
At 2012-12-29 14:23:45 -0500, sfr...@snowman.net wrote:

 Regarding the actual comment, here's the wording that I'd use:

Sorry for nitpicking, but we can't long jumps made me cringe.
Here's a slightly more condensed version:

/*
 * We can't use ereport(ERROR) here, because any longjmps
 * in DoingCommandRead state run the risk of violating our
 * protocol or the SSL protocol, by interrupting OpenSSL in
 * the middle of changing its internal state.
 *
 * Currently, the only option is to promote ERROR to FATAL
 * until we figure out a better way to handle errors in this
 * state.
 */

Patch along these lines attached, which also removes trailing
whitespace from the original patch.

-- Abhijit
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 407c548..5b952d4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2908,6 +2908,17 @@ ProcessInterrupts(void)
 			DisableNotifyInterrupt();
 			DisableCatchupInterrupt();
 			pgstat_report_recovery_conflict(RecoveryConflictReason);
+
+			/*
+			 * We can't use ereport(ERROR) here, because any longjmps
+			 * in DoingCommandRead state run the risk of violating our
+			 * protocol or the SSL protocol, by interrupting OpenSSL in
+			 * the middle of changing its internal state.
+			 *
+			 * Currently, the only option is to promote ERROR to FATAL
+			 * until we figure out a better way to handle errors in this
+			 * state.
+			 */
 			if (DoingCommandRead)
 ereport(FATAL,
 		(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Pavan Deolasee
On Thu, Jan 17, 2013 at 9:26 AM, Abhijit Menon-Sen a...@2ndquadrant.comwrote:

 At 2012-12-29 14:23:45 -0500, sfr...@snowman.net wrote:
 
  Regarding the actual comment, here's the wording that I'd use:

 Sorry for nitpicking, but we can't long jumps made me cringe.
 Here's a slightly more condensed version:

 /*
  * We can't use ereport(ERROR) here, because any longjmps
  * in DoingCommandRead state run the risk of violating our
  * protocol or the SSL protocol, by interrupting OpenSSL in
  * the middle of changing its internal state.
  *
  * Currently, the only option is to promote ERROR to FATAL
  * until we figure out a better way to handle errors in this
  * state.
  */

 Patch along these lines attached, which also removes trailing
 whitespace from the original patch.


Thanks Stephen and Abhijit for improving the comments. I like this wording.
So +1 from my side. Abhijit, do you want to add the patch and change the CF
status appropriately ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 Sorry for nitpicking, but we can't long jumps made me cringe.

Agreed :-(

 Here's a slightly more condensed version:

I find this still not quite right, because where it's placed, it applies
to both the DoingCommandRead and !DoingCommandRead branches of the
if/else statement.  The wording would be okay if placed inside the first
if branch, but I think visually that would look ugly.  I'm inclined to
suggest wording it like

* If we're in DoingCommandRead state, we can't use ereport(ERROR),
* because any longjmp would risk interrupting OpenSSL operations
* and thus confusing that library and/or violating wire protocol.

plus your second para as-is.

But having said that ... are we sure this code is not actually broken?
ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
cannot safely attempt to send an error message to the client either;
but ereport(FATAL) will try exactly that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2013-01-16 Thread Pavan Deolasee
On Thu, Jan 17, 2013 at 12:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:


 But having said that ... are we sure this code is not actually broken?


I'm not.


 ISTM that if we dare not interrupt for fear of confusing OpenSSL, we
 cannot safely attempt to send an error message to the client either;
 but ereport(FATAL) will try exactly that.


I thought since FATAL will force the backend to exit, we don't care much
about corrupted OpenSSL state. I even thought that's why we raise ERROR to
FATAL so that the backend can start in a clean state. But clearly I'm
missing a point here because you don't think that way.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-29 Thread Stephen Frost
Pavan,

* Pavan Deolasee (pavan.deola...@gmail.com) wrote:
 On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee 
 pavan.deola...@gmail.comwrote:
  Thanks Andres. I also read the original thread and I now understand why we
  are using FATAL here, at least until we have a better solution. Obviously
  the connection reset is no good either because as someone commented in the
  original discussion, I thought that I'm seeing a server crash while it was
  not.
 
 How about attached comment to be added at the appropriate place ? I've
 extracted this mostly from Tom's explanation in the original thread.

I was hoping to see an update to the actual error messages returned in
this patch..  I agree that it's good to add the comments but that
doesn't do anything to help the user out in this case.

Regarding the actual comment, here's the wording that I'd use:

---
If we are in DoingCommandRead state, we can't use ereport(ERROR)
because we can't long jumps in this state.  If we attempt to
longjmps in this state, we not only risk breaking protocol at
our level, but also risk leaving openssl in an inconsistent
state, either violating the ssl protocol or having its internal
state trashed because it was interrupted while in the middle of
changing that state.

Currently, the only option is to promote ERROR to FATAL until
we figure out a way to handle errors more effectively while in
this state.
---

If you agree with that wording update, can you update the patch?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Andres Freund
Hi,

On 2012-12-04 12:30:43 +0530, Pavan Deolasee wrote:
 I was trying some simple queries on a Hot Standby with streaming
 replication.

 On standby, I do this:
 postgres=# begin transaction isolation level repeatable read;
 BEGIN
 postgres=# explain verbose select count(b) from test WHERE a  10;

 On master, I insert a bunch of tuples in the table and run VACUUM ANALYZE.
 postgres=# INSERT INTO test VALUES (generate_series(110001,12), 'foo',
 1);
 INSERT 0 1
 postgres=# VACUUM ANALYZE test;
 VACUUM

 After max_standby_streaming_delay, the standby starts cancelling the
 queries. I get an error like this on the standby:
 postgres=# explain verbose select count(b) from test WHERE a  10;
 FATAL:  terminating connection due to conflict with recovery
 DETAIL:  User query might have needed to see row versions that must be
 removed.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost. Attempting reset: Succeeded.

 So I've couple questions/concerns here

 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
 abort the transaction. There are four places in ProcessInterrupts() where
 we throw these kind of errors and three of them are FATAL.

The problem here is that were in IDLE IN TRANSACTION in this case. Which
currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
anything).

There are two problems making this non-trivial. For one, while we're in
IDLE IN TXN the client doesn't expect a response on a protocol level, so
we can't simply ereport() at that time.
For another, when were in IDLE IN TXN we're potentially inside openssl
so we can't jump out of there anyway because that would quite likely
corrupt the internal state of openssl.

I tried to fix this before (c.f. Idle in transaction cancellation or
similar) but while I had some kind of fix for the first issue (i saved
the error and reported it later when the protocol state allows it) I
missed the jumping out of openssl bit. I think its not that hard to
solve though. I remember having something preliminary but I never had
the time to finish it. If I remember correctly the trick was to set
openssl into non-blocking mode temporarily and return to the caller
inside be-secure.c:my_sock_read.
At that location ProcessInterrupts can run safely, error out silently,
and reraise the error once were in the right protocol state.

 2836 else if (RecoveryConflictPending  RecoveryConflictRetryable)
 2837 {
 2838 pgstat_report_recovery_conflict(RecoveryConflictReason);
 2839 ereport(FATAL,
 2840 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 2841   errmsg(terminating connection due to conflict with
 recovery),
 2842  errdetail_recovery_conflict()));
 2843 }
 2844 else if (RecoveryConflictPending)
 2845 {
 2846 /* Currently there is only one non-retryable recovery
 conflict */
 2847 Assert(RecoveryConflictReason ==
 PROCSIG_RECOVERY_CONFLICT_DATABASE);
 2848 pgstat_report_recovery_conflict(RecoveryConflictReason);
 2849 ereport(FATAL,
 2850 (errcode(ERRCODE_DATABASE_DROPPED),
 2851   errmsg(terminating connection due to conflict with
 recovery),
 2852  errdetail_recovery_conflict()));
 2853 }

 AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
 of whether RecoveryConflictRetryable is true or false, we will always
 ereport(FATAL).

Which is fine, because were below if (ProcDiePending). Note there's a
separate path for QueryCancelPending. We go on to kill connections once
the normal conflict handling has tried several times.

 2. For my test, the error message itself looks wrong because I did not
 actually remove any rows on the master. VACUUM probably marked a bunch of
 pages as all-visible and that should have triggered a conflict on the
 standby in order to support index-only scans. IMHO we should  improve the
 error message to avoid any confusion. Or we can add a new ProcSignalReason
 to differentiate between a cancel due to clean up vs visibilitymap_set()
 operation.

I think we desparately need to improve *all* of these message with
significantly more detail (cause for cancellation, relation, current
xid, conflicting xid, current/last query).

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 1:44 PM, Andres Freund and...@2ndquadrant.comwrote:


 
  After max_standby_streaming_delay, the standby starts cancelling the
  queries. I get an error like this on the standby:
  postgres=# explain verbose select count(b) from test WHERE a  10;
  FATAL:  terminating connection due to conflict with recovery
  DETAIL:  User query might have needed to see row versions that must be
  removed.
  HINT:  In a moment you should be able to reconnect to the database and
  repeat your command.
  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  The connection to the server was lost. Attempting reset: Succeeded.
 
  So I've couple questions/concerns here
 
  1. Why to throw a FATAL error here ? A plain ERROR should be enough to
  abort the transaction. There are four places in ProcessInterrupts() where
  we throw these kind of errors and three of them are FATAL.

 The problem here is that were in IDLE IN TRANSACTION in this case. Which
 currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
 anything).

 There are two problems making this non-trivial. For one, while we're in
 IDLE IN TXN the client doesn't expect a response on a protocol level, so
 we can't simply ereport() at that time.
 For another, when were in IDLE IN TXN we're potentially inside openssl
 so we can't jump out of there anyway because that would quite likely
 corrupt the internal state of openssl.

 I tried to fix this before (c.f. Idle in transaction cancellation or
 similar) but while I had some kind of fix for the first issue (i saved
 the error and reported it later when the protocol state allows it) I
 missed the jumping out of openssl bit. I think its not that hard to
 solve though. I remember having something preliminary but I never had
 the time to finish it. If I remember correctly the trick was to set
 openssl into non-blocking mode temporarily and return to the caller
 inside be-secure.c:my_sock_read.


Thanks Andres. I also read the original thread and I now understand why we
are using FATAL here, at least until we have a better solution. Obviously
the connection reset is no good either because as someone commented in the
original discussion, I thought that I'm seeing a server crash while it was
not.




 
  AFAICS the first of these should be ereport(ERROR). Otherwise
 irrespective
  of whether RecoveryConflictRetryable is true or false, we will always
  ereport(FATAL).

 Which is fine, because were below if (ProcDiePending). Note there's a
 separate path for QueryCancelPending. We go on to kill connections once
 the normal conflict handling has tried several times.


Ok. Understood.I now see that every path below if (ProcDiePending) will
call FATAL, albeit with different error codes. That explains the current
code.




 I think we desparately need to improve *all* of these message with
 significantly more detail (cause for cancellation, relation, current
 xid, conflicting xid, current/last query).


I agree.

Thanks,
Pavan


-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:



 Thanks Andres. I also read the original thread and I now understand why we
 are using FATAL here, at least until we have a better solution. Obviously
 the connection reset is no good either because as someone commented in the
 original discussion, I thought that I'm seeing a server crash while it was
 not.


How about attached comment to be added at the appropriate place ? I've
extracted this mostly from Tom's explanation in the original thread.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


clarify-fatal-error-in-conflict-resolve.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-12-03 Thread Robert Haas
On Fri, Nov 30, 2012 at 6:41 PM, Andres Freund and...@2ndquadrant.com wrote:
 While we're talking about changing defaults, how about changing the
 default value of the recovery.conf parameter 'standby_mode' to on?
 Not sure about anybody else, but I never want it any other way.

 Hm. But only if there is a recovery.conf I guess?

Yeah, wouldn't make much sense otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-12-01 Thread Albe Laurenz
Magnus Hagander wrote:
 On 30.11.2012 21:02, Andres Freund wrote:
 There are workloads where its detrimental, but in general having it
 default to on improver experience tremendously because getting conflicts
 because of vacuum is rather confusing.

 In the workloads where it might not be a good idea (very long queries on
 the standby, many dead tuples on the primary) you need to think very
 carefuly about the strategy of avoiding conflicts anyway, and explicit
 configuration is required as well.

 Does anybody have an argument against changing the default value?


 -1. By default, I would expect a standby server to not have any meaningful
 impact on the performance of the master. With hot standby feedback, you can
 bloat the master very badly if you're not careful.
 
 I'm with Heikki on the -1 on this. It's certainly unexpected to have
 the slave affect the master by default - people will expect the master
 to be independent.

I agree.

 +1. Having your reporting query time out *shows you* the problem.
 Having the master bloat for you won't show the problem until later -
 when it's much bigger, and it's much more pain to recover from.

I couldn't agree more.

There are different requirements, and there will always be
people who need to change the defaults, but the way it is is
the safest in my opinion.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Simon Riggs
On 30 November 2012 19:02, Andres Freund and...@2ndquadrant.com wrote:

 The subject says it all.

 There are workloads where its detrimental, but in general having it
 default to on improver experience tremendously because getting conflicts
 because of vacuum is rather confusing.

 In the workloads where it might not be a good idea (very long queries on
 the standby, many dead tuples on the primary) you need to think very
 carefuly about the strategy of avoiding conflicts anyway, and explicit
 configuration is required as well.

 Does anybody have an argument against changing the default value?

I don't see a technical objection, perhaps others do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Robert Haas
On Fri, Nov 30, 2012 at 2:02 PM, Andres Freund and...@2ndquadrant.com wrote:
 Does anybody have an argument against changing the default value?

Well, the disadvantage of it is that the standby can bloat the master,
which might be surprising to some people, too.  But I don't really
have a lot of skin in this game.

While we're talking about changing defaults, how about changing the
default value of the recovery.conf parameter 'standby_mode' to on?
Not sure about anybody else, but I never want it any other way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Josh Berkus

 In the workloads where it might not be a good idea (very long queries on
 the standby, many dead tuples on the primary) you need to think very
 carefuly about the strategy of avoiding conflicts anyway, and explicit
 configuration is required as well.
 
 Does anybody have an argument against changing the default value?

On balance, I think it's a good idea.  It's easier for new users,
conceptually, to deal with table bloat than query cancel.

Have we done testing on how much query cancel it actually eliminates?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Heikki Linnakangas

On 30.11.2012 21:02, Andres Freund wrote:

Hi,

The subject says it all.

There are workloads where its detrimental, but in general having it
default to on improver experience tremendously because getting conflicts
because of vacuum is rather confusing.

In the workloads where it might not be a good idea (very long queries on
the standby, many dead tuples on the primary) you need to think very
carefuly about the strategy of avoiding conflicts anyway, and explicit
configuration is required as well.

Does anybody have an argument against changing the default value?


-1. By default, I would expect a standby server to not have any 
meaningful impact on the performance of the master. With hot standby 
feedback, you can bloat the master very badly if you're not careful.


Think of someone setting up a test server, by setting it up as a standby 
from the master. Now, when someone holds a transaction open in the test 
server, you get bloat in the master. Or if you set up a standby for 
reporting purposes - a very common use case - you would not expect a 
long running ad-hoc query in the standby to bloat the master. That's 
precisely why you set up such a standby in the first place.


You could of course still turn it off, but you would have to know about 
it in the first place. I think it's a reasonable assumption that a 
standby does *not* affect the master (aside from the bandwidth and disk 
space required to retain/ship the WAL). If you have to remember to 
explicitly set a GUC to get that behavior, that's a pretty big gotcha.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Claudio Freire
On Fri, Nov 30, 2012 at 5:46 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

 Think of someone setting up a test server, by setting it up as a standby
 from the master. Now, when someone holds a transaction open in the test
 server, you get bloat in the master. Or if you set up a standby for
 reporting purposes - a very common use case - you would not expect a long
 running ad-hoc query in the standby to bloat the master. That's precisely
 why you set up such a standby in the first place.

Without hot standby feedback, reporting queries are impossible. I've
experienced it. Cancellations make it impossible to finish any
decently complex reporting query.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 While we're talking about changing defaults, how about changing the
 default value of the recovery.conf parameter 'standby_mode' to on?
 Not sure about anybody else, but I never want it any other way.

Dunno, it's been only a couple of days since there was a thread about
somebody who had turned it on and not gotten the results he wanted
(because he was only trying to do a point-in-time recovery not create
a standby).  There's enough other configuration needed to set up a
standby node that I'm not sure flipping this default helps the case
much.

But having said that, would it be practical to get rid of the explicit
standby_mode parameter altogether?  I'm thinking we could assume standby
mode is wanted if primary_conninfo has a nonempty value.

There remains the case of a standby being fed solely from WAL archive
without primary_conninfo, but that's a pretty darn corner-y corner case,
and I doubt it has to be easy to set up.  One possibility for it is to
allow primary_conninfo to be set to none, which would still trigger
standby mode but could be coded to not enable connection attempts.

Mind you, I'm not sure that such a design is easier to understand or
document.  But it would be one less parameter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Heikki Linnakangas

On 30.11.2012 22:49, Claudio Freire wrote:

On Fri, Nov 30, 2012 at 5:46 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:


Think of someone setting up a test server, by setting it up as a standby
from the master. Now, when someone holds a transaction open in the test
server, you get bloat in the master. Or if you set up a standby for
reporting purposes - a very common use case - you would not expect a long
running ad-hoc query in the standby to bloat the master. That's precisely
why you set up such a standby in the first place.


Without hot standby feedback, reporting queries are impossible. I've
experienced it. Cancellations make it impossible to finish any
decently complex reporting query.


Maybe so, but I'd rather get cancellations in the standby, and then read 
up on feedback and the other options and figure out how to make it work, 
than get severe bloat in the master and scratch my head wondering what's 
causing it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Kevin Grittner
Claudio Freire wrote:

 Without hot standby feedback, reporting queries are impossible.
 I've experienced it. Cancellations make it impossible to finish
 any decently complex reporting query.

With what setting of max_standby_streaming_delay? I would rather
default that to -1 than default hot_standby_feedback on. That way
what you do on the standby only affects the standby.

A default that allows anyone who has a read-only login to a standby
to bloat the server by default, which may require hours of down
time to correct, seems dangerous to me.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Claudio Freire
On Fri, Nov 30, 2012 at 6:06 PM, Kevin Grittner kgri...@mail.com wrote:

 Without hot standby feedback, reporting queries are impossible.
 I've experienced it. Cancellations make it impossible to finish
 any decently complex reporting query.

 With what setting of max_standby_streaming_delay? I would rather
 default that to -1 than default hot_standby_feedback on. That way
 what you do on the standby only affects the standby.

1d


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 Without hot standby feedback, reporting queries are impossible. I've
 experienced it. Cancellations make it impossible to finish any
 decently complex reporting query.

The original expectation was that slave-side cancels would be
infrequent.  Maybe there's some fixing/tuning to be done there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Kevin Grittner
Claudio Freire wrote:

 With what setting of max_standby_streaming_delay? I would rather
 default that to -1 than default hot_standby_feedback on. That
 way what you do on the standby only affects the standby.
 
 1d

Was there actually a transaction hanging open for an entire day on
the standby? Was it a query which actually ran that long, or an
ill-behaved user or piece of software?

I have most certainly managed databases where holding up vacuuming
on the source would cripple performance to the point that users
would have demanded that any other process causing it must be
immediately canceled. And canceling it wouldn't be enough at that
point -- the bloat would still need to be fixed before they could
work efficiently.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Claudio Freire
On Fri, Nov 30, 2012 at 6:20 PM, Kevin Grittner kgri...@mail.com wrote:
 Claudio Freire wrote:

 With what setting of max_standby_streaming_delay? I would rather
 default that to -1 than default hot_standby_feedback on. That
 way what you do on the standby only affects the standby.

 1d

 Was there actually a transaction hanging open for an entire day on
 the standby? Was it a query which actually ran that long, or an
 ill-behaved user or piece of software?

No, and if there was, I wouldn't care for it to be cancelled.

Queries were being cancelled way before that timeout was reached,
probably something to do with max_keep_segments on the master side
being unable to keep up for that long.

 I have most certainly managed databases where holding up vacuuming
 on the source would cripple performance to the point that users
 would have demanded that any other process causing it must be
 immediately canceled. And canceling it wouldn't be enough at that
 point -- the bloat would still need to be fixed before they could
 work efficiently.

I wouldn't mind occasional cancels, but these were recurring. When a
query ran long enough, there was no way for it to finish, no matter
how many times you tried. The master never stops being busy, that's
probably a factor.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Heikki Linnakangas

On 30.11.2012 23:40, Claudio Freire wrote:

On Fri, Nov 30, 2012 at 6:20 PM, Kevin Grittnerkgri...@mail.com  wrote:

Claudio Freire wrote:


With what setting of max_standby_streaming_delay? I would rather
default that to -1 than default hot_standby_feedback on. That
way what you do on the standby only affects the standby.


1d


Was there actually a transaction hanging open for an entire day on
the standby? Was it a query which actually ran that long, or an
ill-behaved user or piece of software?


No, and if there was, I wouldn't care for it to be cancelled.

Queries were being cancelled way before that timeout was reached,
probably something to do with max_keep_segments on the master side
being unable to keep up for that long.


Running out of max_keep_segments would produce a different error, 
requiring a new base backup.



I have most certainly managed databases where holding up vacuuming
on the source would cripple performance to the point that users
would have demanded that any other process causing it must be
immediately canceled. And canceling it wouldn't be enough at that
point -- the bloat would still need to be fixed before they could
work efficiently.


I wouldn't mind occasional cancels, but these were recurring. When a
query ran long enough, there was no way for it to finish, no matter
how many times you tried. The master never stops being busy, that's
probably a factor.


Hmm, it sounds like max_standby_streaming_delay=1d didn't work as 
intended for some reason. It should've given the query one day to run 
before canceling it. Unless the standby was running one day behind the 
master already, but that seems unlikely. Any chance you could reproduce 
that?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Claudio Freire
On Fri, Nov 30, 2012 at 6:49 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I have most certainly managed databases where holding up vacuuming
 on the source would cripple performance to the point that users
 would have demanded that any other process causing it must be
 immediately canceled. And canceling it wouldn't be enough at that
 point -- the bloat would still need to be fixed before they could
 work efficiently.


 I wouldn't mind occasional cancels, but these were recurring. When a
 query ran long enough, there was no way for it to finish, no matter
 how many times you tried. The master never stops being busy, that's
 probably a factor.


 Hmm, it sounds like max_standby_streaming_delay=1d didn't work as intended
 for some reason. It should've given the query one day to run before
 canceling it. Unless the standby was running one day behind the master
 already, but that seems unlikely. Any chance you could reproduce that?

I have a pre-production server with replication for these tests. I
could create a fake stream of writes on it, disable feedback, and see
what happens.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Andres Freund
On 2012-11-30 14:35:37 -0500, Robert Haas wrote:
 On Fri, Nov 30, 2012 at 2:02 PM, Andres Freund and...@2ndquadrant.com wrote:
  Does anybody have an argument against changing the default value?

 Well, the disadvantage of it is that the standby can bloat the master,
 which might be surprising to some people, too.  But I don't really
 have a lot of skin in this game.

Sure, thats a problem. But ISTM that its a problem everyone running
postgres has to know about anyway from running the master itself.

 While we're talking about changing defaults, how about changing the
 default value of the recovery.conf parameter 'standby_mode' to on?
 Not sure about anybody else, but I never want it any other way.

Hm. But only if there is a recovery.conf I guess?

Andres

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Daniel Farina
On Fri, Nov 30, 2012 at 11:35 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 30, 2012 at 2:02 PM, Andres Freund and...@2ndquadrant.com wrote:
 Does anybody have an argument against changing the default value?

 Well, the disadvantage of it is that the standby can bloat the master,
 which might be surprising to some people, too.  But I don't really
 have a lot of skin in this game.

Under this precept, we used to not enable hot standby feedback and
instead allowed more or less unbounded staleness of the standby
through very long cancellation times. Although not immediate,
eventually we decided that enough people were getting confused by
sufficiently long standby delay caused by bad queries and idle in xact
backends, so now we have enabled feedback for new database replicants,
along with some fairly un-aggressive cancellation timeouts. It's all
rather messy and not very satisfying.  We have yet to know if feedback
causes or solves problems, on average.

In very early versions we tried the default cancellation settings, and
query cancellation confused everyone a *lot*.  That went away in a
hurry as a result, so I suppose it's not entirely unreasonable to say
in retrospect that the defaults can be considered kind of bad.

Longer term, I think I'd be keen to switch all our user-controlled
replication to logical except for use cases where the workload of the
standby is under our (and not the user's) control, such as for
failover.

Unfortunately, our experience with the feature and its use suggests
that the contract granted by the mechanisms seen in hot standby are
too complex for full-stack developers to keep in careful consideration
along with all the other things they want to do with their application
and/or have to remember about Postgres to get by.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Magnus Hagander
On Fri, Nov 30, 2012 at 9:46 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 30.11.2012 21:02, Andres Freund wrote:

 Hi,

 The subject says it all.

 There are workloads where its detrimental, but in general having it
 default to on improver experience tremendously because getting conflicts
 because of vacuum is rather confusing.

 In the workloads where it might not be a good idea (very long queries on
 the standby, many dead tuples on the primary) you need to think very
 carefuly about the strategy of avoiding conflicts anyway, and explicit
 configuration is required as well.

 Does anybody have an argument against changing the default value?


 -1. By default, I would expect a standby server to not have any meaningful
 impact on the performance of the master. With hot standby feedback, you can
 bloat the master very badly if you're not careful.

I'm with Heikki on the -1 on this. It's certainly unexpected to have
the slave affect the master by default - people will expect the master
to be independent.

Also, it doesn't IMHO actually *help*. The big thing that makes it
harder for people to set up replication that way is wal_level=minimal
by default, and in a smaller sense max_wal_senders (but
wal_level=minimal also has the interesting property that it's not
enough to change it to wal_level=hot_standby if you figure it out too
late - you have to turn off hot standby on the slave, start it, have
it catch up, shut it down, and reenable hot standby). And they
requires a *restart* of the master, which is a lot worse than a small
change to the config of the *slave*. So unless you're suggesting to
change the default of those two values as well, I'm not sure it really
helps that much...


 Think of someone setting up a test server, by setting it up as a standby
 from the master. Now, when someone holds a transaction open in the test
 server, you get bloat in the master. Or if you set up a standby for
 reporting purposes - a very common use case - you would not expect a long
 running ad-hoc query in the standby to bloat the master. That's precisely
 why you set up such a standby in the first place.

 You could of course still turn it off, but you would have to know about it
 in the first place. I think it's a reasonable assumption that a standby does
 *not* affect the master (aside from the bandwidth and disk space required to
 retain/ship the WAL). If you have to remember to explicitly set a GUC to get
 that behavior, that's a pretty big gotcha.

+1. Having your reporting query time out *shows you* the problem.
Having the master bloat for you won't show the problem until later -
when it's much bigger, and it's much more pain to recover from.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Magnus Hagander
On Fri, Nov 30, 2012 at 10:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 Without hot standby feedback, reporting queries are impossible. I've
 experienced it. Cancellations make it impossible to finish any
 decently complex reporting query.

 The original expectation was that slave-side cancels would be
 infrequent.  Maybe there's some fixing/tuning to be done there.

It depends completely on the query pattern on the master. Saying that
cancellations makes it impossible to finish any decently complex
reporting query is completely incorrect - it depends on the queries
on the *master*, not on the complexity of the query on the slave. I
know a lot of scenarios where query cancels pretty much never happen
at all.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Andres Freund
On 2012-11-30 22:46:06 +0200, Heikki Linnakangas wrote:
 On 30.11.2012 21:02, Andres Freund wrote:
 Hi,
 
 The subject says it all.
 
 There are workloads where its detrimental, but in general having it
 default to on improver experience tremendously because getting conflicts
 because of vacuum is rather confusing.
 
 In the workloads where it might not be a good idea (very long queries on
 the standby, many dead tuples on the primary) you need to think very
 carefuly about the strategy of avoiding conflicts anyway, and explicit
 configuration is required as well.
 
 Does anybody have an argument against changing the default value?

 -1. By default, I would expect a standby server to not have any meaningful
 impact on the performance of the master. With hot standby feedback, you can
 bloat the master very badly if you're not careful.

True. But everyone running postgres hopefully knows the problem
already. So that effect is relatively easy to explain.

The other control possibilities we have are rather hard to understand
and to setup in my experience.

 Think of someone setting up a test server, by setting it up as a standby
 from the master. Now, when someone holds a transaction open in the test
 server, you get bloat in the master. Or if you set up a standby for
 reporting purposes - a very common use case - you would not expect a long
 running ad-hoc query in the standby to bloat the master. That's precisely
 why you set up such a standby in the first place.

But you can't do any meaningful reporting without changing the current
variables around this anyway. If you have any writes on the master
barely any significant query ever completes.
The two basic choices we give people suck more imo:
* you setup a large delay: It possibly takes a very long time to catch
up if the primary dies, you don't see any up2date data in later queries
* you abort queries: You can't do any reporting queries

Both are unusable for most scenarios and getting the former just right
is hard.

Imo a default of on works in far more scenarios than the contrary.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Josh Berkus
All:

Well, the problem is that we have three configurations which only work
for one very common scenario:

- reporting slave: feedback off, very long replication_delay
- load-balancing slave: feedback on, short replication_delay
- backup/failover slave: feedback off, short replication_delay

I don't think anyone without a serious market survey can say that any of
the above scenarios is more common than the others; I run into all three
pretty darned frequently.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Feedback should default to on in 9.3+

2012-11-30 Thread Andres Freund
On 2012-11-30 16:09:15 -0500, Tom Lane wrote:
 Claudio Freire klaussfre...@gmail.com writes:
  Without hot standby feedback, reporting queries are impossible. I've
  experienced it. Cancellations make it impossible to finish any
  decently complex reporting query.

 The original expectation was that slave-side cancels would be
 infrequent.  Maybe there's some fixing/tuning to be done there.

I've mostly seen snapshot conflicts. Its hard to say anything more
precise because we don't log any additional information (its admittedly
not easy).

I think it would already help a lot if
ResolveRecoveryConflictWithSnapshot would log:
* the relfilenode (already passed)
* the removed xid
* the pid of the backend holding the oldest snapshot
* the oldest xid of that backend

Most of that should be easy to get.

But I don't think we really can expect a very low rate of conflicts if
the primary has few longrunning transactions but the standby does.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-06-26 Thread Robert Haas
On Tue, May 22, 2012 at 12:15 PM, chinnaobi chinna...@gmail.com wrote:
 You mean when the primary which is going to switch its role to standby might
 not have sent all the WAL records to the standby and If it is switched to
 standby it has more WAL records than the standby which is now serves as
 primary. Is it ??

Yes, that is possible.  Or the standby might have received all the WAL
records but not be caught up in terms of replaying them.

 It is actually the standby server which has to be restored from archive when
 it is switching to primary right .. Not the primary which is switching to
 standby ??

If you want to promote a standby, you can just do it (pg_ctl promote).
 If you have a master that you want to demote to a standby, you've got
to resync it to whatever the current master is.  I understand repmgr
has some tooling to help automate that, although I have not played
with it myself.  In any event rsync can be a big help in reducing the
resync time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread Kevin Grittner
chinnaobi  wrote:
 
 You mean when the primary which is going to switch its role to
 standby might not have sent all the WAL records to the standby and
 If it is switched to standby it has more WAL records than the
 standby which is now serves as primary. Is it ??
 
What happens when there is a network fault between the primary and
the standby, but not between the primary and some of the clients
updating it?  Similarly, if this is asynchronous replication, what if
there have been commits on the primary which were still in the
network buffer when the primary crashed?
 
Clean automated failover is not a trivial task.  If you are writing
your own, it would be best to follow the steps recommended in the
documentation rather closely.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread chinnaobi
Dear Kevin,

Thank you for your reply. Yeah I am writing an application using powershell,
it's true it is not trivial and especially a guy like me who has no idea on
database. 

You raised all the cases which I am muddling with, But currently I am
testing this setup:

Always standby server is configured from base backup and restore from
storage server then start streaming replication(asynchronous). Base backups
are taken frequently. 

I am sure there is some data loss during switching. Still researching how to
do it clean. suggest me if you have any good papers on this ..

Reddy.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710830.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread chinnaobi
Sorry to mention, In my setup the primary and standby servers receive same
traffic, so no issue with the 
network fault between the primary and the standby, but not between the
primary and some of the clients updating it

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710832.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-22 Thread Robert Haas
On Mon, May 14, 2012 at 8:18 AM, chinnaobi chinna...@gmail.com wrote:
 I do base backup only first time on standby when it is going to be
 replicated. when ever primary goes down,  standby becomes primary and
 primary becomes standby when primary comes up. When primary becomes standby
 I am restoring data from WAL archive and start postgres service streaming
 replication to connect to primary.

 This setup is working.

I don't think this is safe.  The primary might have WAL that never
made it to the standby, in which case the two machines will be out of
sync with each other and all sorts of bad stuff could happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-22 Thread chinnaobi

Dear Robert,

Thank you very much for the reply. 

You mean when the primary which is going to switch its role to standby might
not have sent all the WAL records to the standby and If it is switched to
standby it has more WAL records than the standby which is now serves as
primary. Is it ??

It is actually the standby server which has to be restored from archive when
it is switching to primary right .. Not the primary which is switching to
standby ??

Regards,
Reddy.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/hot-standby-PSQL-9-1-Windows-2008-Servers-tp5708637p5709495.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Failover Scenario

2012-02-28 Thread Greg Smith

On 02/27/2012 10:05 PM, Lucky Haryadi wrote:

3. When Master A fails to service, the database will failovered to Slave
B by triggering with trigger file.


As soon as you trigger a standby, it changes it to a new timeline.  At 
that point, the series of WAL files diverges.  It's no longer possible 
to apply them to a system that is still on the original timeline, such 
as your original master A in this situation.  There's a good reason for 
that.  Let's say that A committed an additional transaction before it 
went down, but that commit wasn't replicated to B.  You can't just move 
records from B over anymore in that case.  The only way to make sure A 
is in sync again is to do a new base backup, which you can potentially 
accelerate using rsync to only copy what has changed.  I see a lot of 
people try to bypass one of the steps recommended in the manual using 
various schemes like yours, and they usually have a bug like this in 
there--sometimes obvious like this, sometimes subtle.  Trying to get too 
clever here is dangerous to your database.


Warning:  pgsql-hackers is the mailing list for people to discuss the 
development of PostgreSQL, not how to use it.  Questions like this 
should be asked on either the pgsql-admin or pgsql-general mailing list. 
 I'm not going to answer additional questions like this from you here 
on this list, and I doubt anyone else will either.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-04 Thread Simon Riggs
On Fri, Feb 3, 2012 at 4:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I think saner behavior might only require this change:

            /*
             * Any unexpected exit (including FATAL exit) of the startup
             * process is treated as a crash, except that we don't want to
             * reinitialize.
             */
            if (!EXIT_STATUS_0(exitstatus))
            {
 -               RecoveryError = true;
 +               if (!FatalError)
 +                   RecoveryError = true;
                HandleChildCrash(pid, exitstatus,
                                 _(startup process));
                continue;
            }

 plus suitable comment adjustments of course.  Haven't tested this yet
 though.

Looks good, will test.

 It's a bit disturbing that nobody has reported this from the field yet.
 Seems to imply that hot standby isn't being used much.

There are many people I know using it in production for more than a year now.

Either they haven't seen it or they haven't reported it to us.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-03 Thread Daniel Farina
On Thu, Feb 2, 2012 at 8:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's a bit disturbing that nobody has reported this from the field yet.
 Seems to imply that hot standby isn't being used much.

I have seen this, but didn't get to dig in, as I thought it could be a
problem from other things done outside Postgres (it also came up in
#6200, but I didn't mention it).

Consider it retroactively reported.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-02 Thread Tom Lane
I wrote:
 I'm currently working with Duncan Rance's test case for bug #6425, and
 I am observing a very nasty behavior in HEAD: once one of the
 hot-standby query backends crashes, the standby postmaster SIGQUIT's
 all its children and then just quits itself, with no log message and
 apparently no effort to restart.  Surely this is not intended?

I looked through postmaster.c and found that the cause of this is pretty
obvious: if the startup process exits with any non-zero status, we
assume that represents an unrecoverable error condition, and set
RecoveryError which causes the postmaster to exit silently as soon as
its last child is gone.  But we do this even if the reason the startup
process did exit(1) is that we sent it SIGQUIT as a result of a crash of
some other process.  Of course this logic dates from a time where the
startup process could not have any siblings, so when it was written,
such a thing was impossible.

I think saner behavior might only require this change:

/*
 * Any unexpected exit (including FATAL exit) of the startup
 * process is treated as a crash, except that we don't want to
 * reinitialize.
 */
if (!EXIT_STATUS_0(exitstatus))
{
-   RecoveryError = true;
+   if (!FatalError)
+   RecoveryError = true;
HandleChildCrash(pid, exitstatus,
 _(startup process));
continue;
}

plus suitable comment adjustments of course.  Haven't tested this yet
though.

It's a bit disturbing that nobody has reported this from the field yet.
Seems to imply that hot standby isn't being used much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby fails if any backend crashes

2012-02-02 Thread Fujii Masao
On Fri, Feb 3, 2012 at 1:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 I'm currently working with Duncan Rance's test case for bug #6425, and
 I am observing a very nasty behavior in HEAD: once one of the
 hot-standby query backends crashes, the standby postmaster SIGQUIT's
 all its children and then just quits itself, with no log message and
 apparently no effort to restart.  Surely this is not intended?

 I looked through postmaster.c and found that the cause of this is pretty
 obvious: if the startup process exits with any non-zero status, we
 assume that represents an unrecoverable error condition, and set
 RecoveryError which causes the postmaster to exit silently as soon as
 its last child is gone.  But we do this even if the reason the startup
 process did exit(1) is that we sent it SIGQUIT as a result of a crash of
 some other process.  Of course this logic dates from a time where the
 startup process could not have any siblings, so when it was written,
 such a thing was impossible.

 I think saner behavior might only require this change:

            /*
             * Any unexpected exit (including FATAL exit) of the startup
             * process is treated as a crash, except that we don't want to
             * reinitialize.
             */
            if (!EXIT_STATUS_0(exitstatus))
            {
 -               RecoveryError = true;
 +               if (!FatalError)
 +                   RecoveryError = true;
                HandleChildCrash(pid, exitstatus,
                                 _(startup process));
                continue;
            }

 plus suitable comment adjustments of course.  Haven't tested this yet
 though.

Looks good to me.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby off of hot standby?

2012-01-30 Thread Simon Riggs
On Mon, Jan 30, 2012 at 4:36 AM, Igor Schtein ischt...@gmail.com wrote:

 Is it possible to use a standby instance as a master/primary for another
 standby in Postgres 9.0?  In other words, does PG 9.0 supports cascading
 standby configuration?

No, but 9.2 will support that feature, known as cascading replication.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby off of hot standby?

2012-01-30 Thread Josh Berkus
On 1/29/12 8:36 PM, Igor Schtein wrote:
 Hi,
 
 Is it possible to use a standby instance as a master/primary for another
 standby in Postgres 9.0?  In other words, does PG 9.0 supports cascading
 standby configuration?

No, that's a 9.2 feature in development.

If you can build PostgreSQL from source and apply patches, we could use
your help testing it!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-11-02 Thread Simon Riggs
On Fri, Oct 28, 2011 at 3:42 AM, Chris Redekop ch...@replicon.com wrote:

 On a side note I am sporadically seeing another error on hotstandby startup.
  I'm not terribly concerned about it as it is pretty rare and it will work
 on a retry so it's not a big deal.  The error is FATAL:  out-of-order XID
 insertion in KnownAssignedXids.  If you think it might be a bug and are
 interested in hunting it down let me know and I'll help any way I can...but
 if you're not too worried about it then neither am I :)

I'd be interested to see further details of this if you see it again,
or have access to previous logs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-11-02 Thread Chris Redekop
oopsreply-to-all

-- Forwarded message --
From: Chris Redekop ch...@replicon.com
Date: Wed, Nov 2, 2011 at 8:41 AM
Subject: Re: [HACKERS] Hot Standby startup with overflowed snapshots
To: Simon Riggs si...@2ndquadrant.com


Sure, I've got quite a few logs lying around - I've attached 3 of 'em...let
me know if there are any specific things you'd like me to do or look for
next time it happens


On Wed, Nov 2, 2011 at 2:59 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Fri, Oct 28, 2011 at 3:42 AM, Chris Redekop ch...@replicon.com wrote:

  On a side note I am sporadically seeing another error on hotstandby
 startup.
   I'm not terribly concerned about it as it is pretty rare and it will
 work
  on a retry so it's not a big deal.  The error is FATAL:  out-of-order
 XID
  insertion in KnownAssignedXids.  If you think it might be a bug and are
  interested in hunting it down let me know and I'll help any way I
 can...but
  if you're not too worried about it then neither am I :)

 I'd be interested to see further details of this if you see it again,
 or have access to previous logs.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



postgresql-2011-10-27_202007.log
Description: Binary data


postgresql-2011-10-31_152925.log
Description: Binary data


postgresql-2011-11-01_094501.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
Thanks for the patch Simon, but unfortunately it does not resolve the issue
I am seeing.  The standby still refuses to finish starting up until long
after all clients have disconnected from the primary (10 minutes).  I do
see your new log statement on startup, but only once - it does not repeat.
 Is there any way for me to see  what the oldest xid on the standby is via
controldata or something like that?  The standby does stream to keep up with
the primary while the primary has load, and then it becomes idle when the
primary becomes idle (when I kill all the connections)so it appears to
be current...but it just doesn't finish starting up

I'm not sure if it's relevant, but after it has sat idle for a couple
minutes I start seeing these statements in the log (with the same offset
every time):
DEBUG:  skipping restartpoint, already performed at 9/9520



On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Chris Redekop's recent report of slow startup for Hot Standby has made
 me revisit the code there.

 Although there isn't a bug, there is a missed opportunity for starting
 up faster which could be the source of Chris' annoyance.

 The following patch allows a faster startup in some circumstances.

 The patch also alters the log levels for messages and gives a single
 simple message for this situation. The log will now say

  LOG:  recovery snapshot waiting for non-overflowed snapshot or until
 oldest active xid on standby is at least %u (now %u)
  ...multiple times until snapshot non-overflowed or xid reached...

 whereas before the first LOG message shown was

  LOG:  consistent state delayed because recovery snapshot incomplete
  and only later, at DEBUG2 do you see
  LOG:  recovery snapshot waiting for %u oldest active xid on standby is %u
  ...multiple times until xid reached...

 Comments please.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop ch...@replicon.com wrote:

 Thanks for the patch Simon, but unfortunately it does not resolve the issue
 I am seeing.  The standby still refuses to finish starting up until long
 after all clients have disconnected from the primary (10 minutes).  I do
 see your new log statement on startup, but only once - it does not repeat.
  Is there any way for me to see  what the oldest xid on the standby is via
 controldata or something like that?  The standby does stream to keep up with
 the primary while the primary has load, and then it becomes idle when the
 primary becomes idle (when I kill all the connections)so it appears to
 be current...but it just doesn't finish starting up
 I'm not sure if it's relevant, but after it has sat idle for a couple
 minutes I start seeing these statements in the log (with the same offset
 every time):
 DEBUG:  skipping restartpoint, already performed at 9/9520

OK, so it looks like there are 2 opportunities to improve, not just one.

Try this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


faster_hot_standby_startup_withsubxacts.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
hrmz, still basically the same behaviour.  I think it might be a *little*
better with this patch.  Before when under load it would start up quickly
maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
only seeing your log statement a single time (I'm running at debug2).  I
have discovered something though - when the standby is in this state if I
force a checkpoint on the primary then the standby comes right up.  Is there
anything I check or try for you to help figure this out?or is it
actually as designed that it could take 10-ish minutes to start up even
after all clients have disconnected from the primary?


On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop ch...@replicon.com wrote:

  Thanks for the patch Simon, but unfortunately it does not resolve the
 issue
  I am seeing.  The standby still refuses to finish starting up until long
  after all clients have disconnected from the primary (10 minutes).  I do
  see your new log statement on startup, but only once - it does not
 repeat.
   Is there any way for me to see  what the oldest xid on the standby is
 via
  controldata or something like that?  The standby does stream to keep up
 with
  the primary while the primary has load, and then it becomes idle when the
  primary becomes idle (when I kill all the connections)so it appears
 to
  be current...but it just doesn't finish starting up
  I'm not sure if it's relevant, but after it has sat idle for a couple
  minutes I start seeing these statements in the log (with the same offset
  every time):
  DEBUG:  skipping restartpoint, already performed at 9/9520

 OK, so it looks like there are 2 opportunities to improve, not just one.

 Try this.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop ch...@replicon.com wrote:

 hrmz, still basically the same behaviour.  I think it might be a *little*
 better with this patch.  Before when under load it would start up quickly
 maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
 or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
 only seeing your log statement a single time (I'm running at debug2).  I
 have discovered something though - when the standby is in this state if I
 force a checkpoint on the primary then the standby comes right up.  Is there
 anything I check or try for you to help figure this out?or is it
 actually as designed that it could take 10-ish minutes to start up even
 after all clients have disconnected from the primary?

Thanks for testing. The improvements cover specific cases, so its not
subject to chance; its not a performance patch.

It's not designed to act the way you describe, but it does.

The reason this occurs is that you have a transaction heavy workload
with occasional periods of complete quiet and a base backup time that
is much less than checkpoint_timeout. If your base backup was slower
the checkpoint would have hit naturally before recovery had reached a
consistent state. Which seems fairly atypical. I guess you're doing
this on a test system.

It seems cheap to add in a call to LogStandbySnapshot() after each
call to pg_stop_backup().

Does anyone think this case is worth adding code for? Seems like one
more thing to break.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
Sorry...designed was poor choice of words, I meant not unexpected.
 Doing the checkpoint right after pg_stop_backup() looks like it will work
perfectly for me, so thanks for all your help!

On a side note I am sporadically seeing another error on hotstandby startup.
 I'm not terribly concerned about it as it is pretty rare and it will work
on a retry so it's not a big deal.  The error is FATAL:  out-of-order XID
insertion in KnownAssignedXids.  If you think it might be a bug and are
interested in hunting it down let me know and I'll help any way I can...but
if you're not too worried about it then neither am I :)


On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop ch...@replicon.com
 wrote:

  hrmz, still basically the same behaviour.  I think it might be a *little*
  better with this patch.  Before when under load it would start up quickly
  maybe 2 or 3 times out of 10 attemptswith this patch it might be up
 to 4
  or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm
 still
  only seeing your log statement a single time (I'm running at debug2).  I
  have discovered something though - when the standby is in this state if I
  force a checkpoint on the primary then the standby comes right up.  Is
 there
  anything I check or try for you to help figure this out?or is it
  actually as designed that it could take 10-ish minutes to start up even
  after all clients have disconnected from the primary?

 Thanks for testing. The improvements cover specific cases, so its not
 subject to chance; its not a performance patch.

 It's not designed to act the way you describe, but it does.

 The reason this occurs is that you have a transaction heavy workload
 with occasional periods of complete quiet and a base backup time that
 is much less than checkpoint_timeout. If your base backup was slower
 the checkpoint would have hit naturally before recovery had reached a
 consistent state. Which seems fairly atypical. I guess you're doing
 this on a test system.

 It seems cheap to add in a call to LogStandbySnapshot() after each
 call to pg_stop_backup().

 Does anyone think this case is worth adding code for? Seems like one
 more thing to break.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 6:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
 It seems cheap to add in a call to LogStandbySnapshot() after each
 call to pg_stop_backup().

 Does anyone think this case is worth adding code for? Seems like one
 more thing to break.

Why at that particular time?

It would maybe nice if the master could notice when it has a plausible
(non-overflowed) snapshot and log it then.  But that might be more
code than the problem is worth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby startup, visibility map, clog

2011-06-11 Thread Robert Haas
On Thu, Jun 9, 2011 at 5:14 AM, Daniel Farina dan...@heroku.com wrote:
 The first fact is that turning off hot standby will let the cluster
 start up, but only after seeing a spate of messages like these (dozen
 or dozens, not thousands):

 2011-06-09 08:02:32 UTC  LOG:  restored log file
 0002002C00C0 from archive
 2011-06-09 08:02:33 UTC  WARNING:  xlog min recovery request
 2C/C1F09658 is past current point 2C/C037B278
 2011-06-09 08:02:33 UTC  CONTEXT:  writing block 0 of relation
 base/16385/16784_vm
        xlog redo insert: rel 1663/16385/128029; tid 114321/63
 2011-06-09 08:02:33 UTC  LOG:  restartpoint starting: xlog

 Most importantly, *all* such messages are in visibility map forks
 (_vm).  I reasonably confident that my code does not start reading
 data until pg_start_backup() has returned, and blocks on
 pg_stop_backup() after having read all the data.  Also, the mailing
 list correspondence at
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg02034.php
 suggests that the visibility map is not flushed at checkpoints, so
 perhaps with some poor timing an old page can wander onto disk even
 after a checkpoint barrier that pg_start_backup waits for. (I have not
 yet found the critical section that makes visibilitymap buffers immune
 to checkpoint though).

I don't believe there's anything that makes visibilitymap buffers
immune to checkpoint.  I might be wrong.  You could probably find out
by inserting a few lines of code into the buffer manager to inspect
each buffer tag as the buffer is written out and elog() if the fork
number is VISIBILITYMAP_FORKNUM.  There is some weirdness around how
the LSNs of visibility map pages are set, however.  See
visibilitymap_set()/clear().  Clearing a visibility map bit is done
without touching the LSN at all; setting the bit advances the LSN to
that of the heap page, if it currently precedes that value.  It's not
obvious to me whether or how that's related.

I'm a bit puzzled by the warning message itself, too.  I think that in
order to generate that warning, you need to try to flush a page to
disk whose LSN is higher than the current minimum recovery point.  And
the minimum recovery point is initially set to point to the redo
pointer from the last checkpoint.  But during an online backup, it's
expected that there may be pages on disk with LSNs higher than the
redo pointer from the last checkpoint.  So actually now I'm not sure
why people don't see this warning whenever a lengthy recovery is
required to reach consitency, especially with small values of
shared_buffers.  I'm probably missing something here...

One thing that might make it easier to understand what's going on here
is to crank the log level up to DEBUG2 and post the full output from a
recovery that exhibits this problem, rather than just a few lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2011-03-11 Thread Fujii Masao
On Fri, Mar 11, 2011 at 4:46 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Is this an open item for 9.1?

 Simon fixed it, commit b9075a6d2f9b07a00262a670dd60272904c79dce.

Oh, thanks!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2011-03-10 Thread Fujii Masao
On Thu, Dec 9, 2010 at 4:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-12-09 at 00:16 +0100, Heikki Linnakangas wrote:
 On 09.12.2010 00:10, Heikki Linnakangas wrote:
  On 08.12.2010 16:00, Simon Riggs wrote:
  Heikki pointed out to me that the btree delete record processing does
  not respect vacuum_defer_cleanup_age. It should.
 
  Attached patch to implement that.
 
  Looking to commit in next few hours barring objections/suggestions, to
  both HEAD and 9_0_STABLE, in time for next minor release.
 
  Please note that it was Noah Misch that raised this a while ago:
 
  http://archives.postgresql.org/pgsql-hackers/2010-11/msg01919.php

 On closer look, that's not actually the same issue, sorry for the noise..

 Heikki, this one *is* important. Will fix. Thanks for the analysis Noah.

Is this an open item for 9.1?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2011-03-10 Thread Heikki Linnakangas

On 11.03.2011 06:21, Fujii Masao wrote:

On Thu, Dec 9, 2010 at 4:55 PM, Simon Riggssi...@2ndquadrant.com  wrote:

On Thu, 2010-12-09 at 00:16 +0100, Heikki Linnakangas wrote:

On 09.12.2010 00:10, Heikki Linnakangas wrote:

On 08.12.2010 16:00, Simon Riggs wrote:

Heikki pointed out to me that the btree delete record processing does
not respect vacuum_defer_cleanup_age. It should.

Attached patch to implement that.

Looking to commit in next few hours barring objections/suggestions, to
both HEAD and 9_0_STABLE, in time for next minor release.


Please note that it was Noah Misch that raised this a while ago:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01919.php


On closer look, that's not actually the same issue, sorry for the noise..


Heikki, this one *is* important. Will fix. Thanks for the analysis Noah.


Is this an open item for 9.1?


Simon fixed it, commit b9075a6d2f9b07a00262a670dd60272904c79dce.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-15 Thread Joachim Wieland
On Tue, Dec 7, 2010 at 3:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, I've committed this patch now.

I can confirm that I could continue replaying the logfiles on the
standby host with this patch.


Thanks a lot,
Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby tuning for btree_xlog_vacuum()

2010-12-09 Thread Simon Riggs
Just wanted to say thanks for the review, since I haven't yet managed to
fix and commit this. I expect to later this month.

On Mon, 2010-09-27 at 23:06 -0400, Robert Haas wrote:
 On Thu, Apr 29, 2010 at 4:12 PM, Simon Riggs si...@2ndquadrant.com wrote:
  Simple tuning of btree_xlog_vacuum() using an idea I had a while back,
  just never implemented. XXX comments removed.
 
  Allows us to avoid reading in blocks during VACUUM replay that are only
  required for correctness of index scans.
 
 Review:
 
 1. The block comment in XLogConfirmBufferIsUnpinned appears to be
 copied from somewhere else, and doesn't really seem appropriate for a
 new function since it refers to the original coding of this routine.
  I think you could just delete the parenthesized portion of the
 comment.
 
 2. This bit from ConfirmBufferIsUnpinned looks odd to me.
 
 + /*
 +  * Found it.  Now, pin/unpin the buffer to prove it's unpinned.
 +  */
 + if (PinBuffer(buf, NULL))
 + UnpinBuffer(buf, false);
 
 I don't think pinning and unpinning the buffer is sufficient to
 provide that it isn't otherwise pinned.  If the buffer isn't in shared
 buffers at all, it seems clear that no one can have it pinned.  But if
 it's present in shared buffers, it seems like you still need
 LockBufferForCleanup().

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2010-12-08 Thread Heikki Linnakangas

On 08.12.2010 16:00, Simon Riggs wrote:

Heikki pointed out to me that the btree delete record processing does
not respect vacuum_defer_cleanup_age. It should.

Attached patch to implement that.

Looking to commit in next few hours barring objections/suggestions, to
both HEAD and 9_0_STABLE, in time for next minor release.


Please note that it was Noah Misch that raised this a while ago:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01919.php

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2010-12-08 Thread Heikki Linnakangas

On 09.12.2010 00:10, Heikki Linnakangas wrote:

On 08.12.2010 16:00, Simon Riggs wrote:

Heikki pointed out to me that the btree delete record processing does
not respect vacuum_defer_cleanup_age. It should.

Attached patch to implement that.

Looking to commit in next few hours barring objections/suggestions, to
both HEAD and 9_0_STABLE, in time for next minor release.


Please note that it was Noah Misch that raised this a while ago:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg01919.php


On closer look, that's not actually the same issue, sorry for the noise..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2010-12-08 Thread Heikki Linnakangas

On 08.12.2010 16:00, Simon Riggs wrote:


Heikki pointed out to me that the btree delete record processing does
not respect vacuum_defer_cleanup_age. It should.

Attached patch to implement that.


This doesn't look right to me. btree_xlog_delete_get_latestRemovedXid() 
function calculates the latest XID present on the tuples that we're 
removing b-tree pointers for. btree_xlog_delete_get_latestRemovedXid() 
is used during recovery. vacuum_defer_cleanup_age should take effect in 
the master, not during recovery.


With the patch, btree_xlog_delete_get_latestRemovedXid() returns a value 
that's much smaller than it should. That's just wrong, it means that 
recovery in the standby will incorrectly think that all the removed 
tuples are old and not visible to any running read-only queries anymore, 
and will go ahead and remove the index tuples for them.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2010-12-08 Thread Simon Riggs
On Thu, 2010-12-09 at 00:39 +0100, Heikki Linnakangas wrote:

 vacuum_defer_cleanup_age should take effect in 
 the master, not during recovery. 

Hmmm, more to the point, it does take effect on the master and so there
is no need for this at all. What were you thinking? What was I? Doh.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby btree delete records and vacuum_defer_cleanup_age

2010-12-08 Thread Simon Riggs
On Thu, 2010-12-09 at 00:16 +0100, Heikki Linnakangas wrote:
 On 09.12.2010 00:10, Heikki Linnakangas wrote:
  On 08.12.2010 16:00, Simon Riggs wrote:
  Heikki pointed out to me that the btree delete record processing does
  not respect vacuum_defer_cleanup_age. It should.
 
  Attached patch to implement that.
 
  Looking to commit in next few hours barring objections/suggestions, to
  both HEAD and 9_0_STABLE, in time for next minor release.
 
  Please note that it was Noah Misch that raised this a while ago:
 
  http://archives.postgresql.org/pgsql-hackers/2010-11/msg01919.php
 
 On closer look, that's not actually the same issue, sorry for the noise..

Heikki, this one *is* important. Will fix. Thanks for the analysis Noah.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-07 Thread Heikki Linnakangas

On 02.12.2010 12:31, Heikki Linnakangas wrote:

On 02.12.2010 13:25, Simon Riggs wrote:

On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the
running-xacts snapshot and writing it to the log is sufficient, I don't
see what moving the running-xacts record buys us. Does it allow some
further simplifications somewhere?


Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.


Most of the changes to procarray.c were about removing code that's no
longer necessary when we close the gap between acquiring and writing the
running-xacts WAL record. You can leave it as it is as a historical
curiosity, but I'd prefer to simplify it, given that we now know that it
doesn't actually work correctly if the gap is not closed.


Ok, I've committed this patch now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 01.12.2010 20:51, Heikki Linnakangas wrote:

Another approach would be to revisit the way the running-xacts snapshot
is taken. Currently, we first take a snapshot, and then WAL-log it.
There is a small window between the steps where backends can begin/end
transactions, and recovery has to deal with that. When this was
designed, there was long discussion on whether we should instead grab
WALInsertLock and ProcArrayLock at the same time, to ensure that the
running-xacts snapshot represents an up-to-date situation at the point
in WAL where it's inserted.

We didn't want to do that because both locks can be heavily contended.
But maybe we should after all. It would make the recovery code simpler.

If we want to get fancy, we wouldn't necessarily need to hold both locks
for the whole duration. We could first grab ProcArrayLock and construct
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
finally write the WAL record and release WALInsertLock. But that would
require small changes to XLogInsert.


I took a look at that approach. We don't actually need to hold 
ProcArrayLock while the WAL-record is written, we need to hold 
XidGenLock. I believe that's less severe than holding the ProcArrayLock 
as there's already precedence for writing a WAL record while holding 
that: we do that when we advance to a new clog page and write a 
zero-clog-page record.


So this is what we should do IMHO.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6095,6102  StartupXLOG(void)
  			StartupSUBTRANS(oldestActiveXID);
  			StartupMultiXact();
  
- 			ProcArrayInitRecoveryInfo(oldestActiveXID);
- 
  			/*
  			 * If we're beginning at a shutdown checkpoint, we know that
  			 * nothing was running on the master at this point. So fake-up an
--- 6095,6100 
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 435,453  ProcArrayClearTransaction(PGPROC *proc)
  }
  
  /*
-  * ProcArrayInitRecoveryInfo
-  *
-  * When trying to assemble our snapshot we only care about xids after this value.
-  * See comments for LogStandbySnapshot().
-  */
- void
- ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
- {
- 	latestObservedXid = oldestActiveXid;
- 	TransactionIdRetreat(latestObservedXid);
- }
- 
- /*
   * ProcArrayApplyRecoveryInfo -- apply recovery info about xids
   *
   * Takes us through 3 states: Initialized, Pending and Ready.
--- 435,440 
***
*** 519,533  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * OK, we need to initialise from the RunningTransactionsData record
! 	 */
! 
! 	/*
! 	 * Remove all xids except xids later than the snapshot. We don't know
! 	 * exactly which ones that is until precisely now, so that is why we allow
! 	 * xids to be added only to remove most of them again here.
  	 */
- 	ExpireOldKnownAssignedTransactionIds(running-nextXid);
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
--- 506,514 
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * Release any locks belonging to old transactions that are not
! 	 * running according to the running-xacts record.
  	 */
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
***
*** 536,544  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
- 	 * Combine the running xact data with already known xids, if any exist.
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * combine them first, sort them and then re-add to KnownAssignedXids.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
--- 517,524 
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * sort them first.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
***
*** 547,597  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	 * xids to subtrans. If RunningXacts is overflowed then we don't have
  	 * enough information to correctly update subtrans anyway.
  	 */
  
  	/*
! 	 * Allocate a temporary array so we can combine xids. The total of both
! 	 * arrays should never normally exceed TOTAL_MAX_CACHED_SUBXIDS.
  	 */
! 	xids = palloc(sizeof(TransactionId) * TOTAL_MAX_CACHED_SUBXIDS);
  
  	/*
! 	 * Get the remaining KnownAssignedXids. In most cases there won't be any
! 	 * at all since this exists only to catch a theoretical race condition.
! 	 */
! 	nxids = KnownAssignedXidsGet(xids, InvalidTransactionId);
! 	if (nxids  0)
! 		

Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Simon Riggs
On Thu, 2010-12-02 at 10:39 +0200, Heikki Linnakangas wrote:
 On 01.12.2010 20:51, Heikki Linnakangas wrote:
  Another approach would be to revisit the way the running-xacts snapshot
  is taken. Currently, we first take a snapshot, and then WAL-log it.
  There is a small window between the steps where backends can begin/end
  transactions, and recovery has to deal with that. When this was
  designed, there was long discussion on whether we should instead grab
  WALInsertLock and ProcArrayLock at the same time, to ensure that the
  running-xacts snapshot represents an up-to-date situation at the point
  in WAL where it's inserted.
 
  We didn't want to do that because both locks can be heavily contended.
  But maybe we should after all. It would make the recovery code simpler.
 
  If we want to get fancy, we wouldn't necessarily need to hold both locks
  for the whole duration. We could first grab ProcArrayLock and construct
  the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
  finally write the WAL record and release WALInsertLock. But that would
  require small changes to XLogInsert.
 
 I took a look at that approach. We don't actually need to hold 
 ProcArrayLock while the WAL-record is written, we need to hold 
 XidGenLock. I believe that's less severe than holding the ProcArrayLock 
 as there's already precedence for writing a WAL record while holding 
 that: we do that when we advance to a new clog page and write a 
 zero-clog-page record.
 
 So this is what we should do IMHO.

Oh, thanks for looking at this. I've been looking at this also and as we
might expect had a slightly different design.

First, your assessment of the locking above is better than mine. I agree
with your analysis so we should do it that way. The locking issue was
the reason I haven't patched this yet so I'm glad you've improved this.

In terms of the rest of the patch, it seems we have different designs, I
think I have a much simpler, less invasive solution:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the 
running-xacts snapshot and writing it to the log is sufficient, I don't 
see what moving the running-xacts record buys us. Does it allow some 
further simplifications somewhere?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Simon Riggs
On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
 On 02.12.2010 11:02, Simon Riggs wrote:
  The cause of the issue is that replay starts at one LSN and there is a
  delay until the RunningXacts WAL record occurs. If there was no delay,
  there would be no issue at all. In CreateCheckpoint() we start by
  grabbing the WAInsertLock and later recording that pointer as part of
  the checkpoint record. My proposal is to replace the grab the lock
  code with the insert of the RunningXacts WAL record (when wal_level
  set), so that recovery always starts with that record type.
 
 Oh, interesting idea. But AFAICS closing the gap between acquiring the 
 running-xacts snapshot and writing it to the log is sufficient, I don't 
 see what moving the running-xacts record buys us. Does it allow some 
 further simplifications somewhere?

Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.

Earlier on this thread, we discussed:

On Wed, 2010-11-24 at 15:19 +, Simon Riggs wrote: 
 On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
  When recovery starts, we fetch the oldestActiveXid from the checkpoint
  record. Let's say that it's 100. We then start replaying WAL records 
  from the Redo pointer, and the first record (heap insert in your case)
  contains an Xid that's much larger than 100, say 1. We call 
  RecordKnownAssignedXids() to make note that all xids between that
  range are in-progress, but there isn't enough room in the array for
  that.
 
 Agreed.

The current code fails because of the gap between the redo pointer and
the XLOG_RUNNING_XACTS WAL record. If there is no gap, there is no
problem.

So my preferred solution would:
* Log XLOG_RUNNING_XACTS while holding XidGenLock, as you suggest
* Move logging to occur at the Redo pointer

That is a much smaller patch with a smaller footprint.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 13:25, Simon Riggs wrote:

On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the
running-xacts snapshot and writing it to the log is sufficient, I don't
see what moving the running-xacts record buys us. Does it allow some
further simplifications somewhere?


Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.


Most of the changes to procarray.c were about removing code that's no 
longer necessary when we close the gap between acquiring and writing the 
running-xacts WAL record. You can leave it as it is as a historical 
curiosity, but I'd prefer to simplify it, given that we now know that it 
doesn't actually work correctly if the gap is not closed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-01 Thread Heikki Linnakangas

On 24.11.2010 12:48, Heikki Linnakangas wrote:

When recovery starts, we fetch the oldestActiveXid from the checkpoint
record. Let's say that it's 100. We then start replaying WAL records
from the Redo pointer, and the first record (heap insert in your case)
contains an Xid that's much larger than 100, say 1. We call
RecordKnownAssignedXids() to make note that all xids between that range
are in-progress, but there isn't enough room in the array for that.

We normally get away with a smallish array because the array is trimmed
at commit and abort records, and the special xid-assignment record to
handle the case of a lot of subtransactions. We initialize the array
from the running-xacts record that's written at a checkpoint. That
mechanism fails in this case because the heap insert record is seen
before the running-xacts record, causing all those xids in the range
100-1 to be considered in-progress. The running-xacts record that
comes later would prune them, but we don't have enough slots to hold
them until that.

Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
the xids we see during WAL replay in private memory instead of putting
them in the KnownAssignedXids array until we see the running-xacts record.


So, here's a patch using that approach.

Another approach would be to revisit the way the running-xacts snapshot 
is taken. Currently, we first take a snapshot, and then WAL-log it. 
There is a small window between the steps where backends can begin/end 
transactions, and recovery has to deal with that. When this was 
designed, there was long discussion on whether we should instead grab 
WALInsertLock and ProcArrayLock at the same time, to ensure that the 
running-xacts snapshot represents an up-to-date situation at the point 
in WAL where it's inserted.


We didn't want to do that because both locks can be heavily contended. 
But maybe we should after all. It would make the recovery code simpler.


If we want to get fancy, we wouldn't necessarily need to hold both locks 
for the whole duration. We could first grab ProcArrayLock and construct 
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and 
finally write the WAL record and release WALInsertLock. But that would 
require small changes to XLogInsert.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6095,6102  StartupXLOG(void)
  			StartupSUBTRANS(oldestActiveXID);
  			StartupMultiXact();
  
- 			ProcArrayInitRecoveryInfo(oldestActiveXID);
- 
  			/*
  			 * If we're beginning at a shutdown checkpoint, we know that
  			 * nothing was running on the master at this point. So fake-up an
--- 6095,6100 
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 435,453  ProcArrayClearTransaction(PGPROC *proc)
  }
  
  /*
-  * ProcArrayInitRecoveryInfo
-  *
-  * When trying to assemble our snapshot we only care about xids after this value.
-  * See comments for LogStandbySnapshot().
-  */
- void
- ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
- {
- 	latestObservedXid = oldestActiveXid;
- 	TransactionIdRetreat(latestObservedXid);
- }
- 
- /*
   * ProcArrayApplyRecoveryInfo -- apply recovery info about xids
   *
   * Takes us through 3 states: Initialized, Pending and Ready.
--- 435,440 
***
*** 519,533  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * OK, we need to initialise from the RunningTransactionsData record
! 	 */
! 
! 	/*
! 	 * Remove all xids except xids later than the snapshot. We don't know
! 	 * exactly which ones that is until precisely now, so that is why we allow
! 	 * xids to be added only to remove most of them again here.
  	 */
- 	ExpireOldKnownAssignedTransactionIds(running-nextXid);
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
--- 506,514 
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * Release any locks belonging to old transactions that are not
! 	 * running according to the running-xacts record.
  	 */
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
***
*** 536,544  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
- 	 * Combine the running xact data with already known xids, if any exist.
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * combine them first, sort them and then re-add to KnownAssignedXids.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
--- 517,524 
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * sort them first.
  	 *
  	 * Some of the new xids are 

Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Heikki Linnakangas

On 24.11.2010 06:56, Joachim Wieland wrote:

On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 19.11.2010 23:46, Joachim Wieland wrote:


FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with
wal_debug=on ? That will print all the replayed WAL records, which is a lot
of data, but it might give a hint what's going on.


Sure, but this gives me only one more line:

[...]
LOG:  redo starts at 1F8/FC00E978
LOG:  REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 4587) exited with exit code 1
LOG:  terminating any other active server processes


Thanks, I can reproduce this now. This happens when you have a wide gap 
between the oldest still active xid and the latest xid.


When recovery starts, we fetch the oldestActiveXid from the checkpoint 
record. Let's say that it's 100. We then start replaying WAL records 
from the Redo pointer, and the first record (heap insert in your case) 
contains an Xid that's much larger than 100, say 1. We call 
RecordKnownAssignedXids() to make note that all xids between that range 
are in-progress, but there isn't enough room in the array for that.


We normally get away with a smallish array because the array is trimmed 
at commit and abort records, and the special xid-assignment record to 
handle the case of a lot of subtransactions. We initialize the array 
from the running-xacts record that's written at a checkpoint. That 
mechanism fails in this case because the heap insert record is seen 
before the running-xacts record, causing all those xids in the range 
100-1 to be considered in-progress. The running-xacts record that 
comes later would prune them, but we don't have enough slots to hold 
them until that.


Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash 
the xids we see during WAL replay in private memory instead of putting 
them in the KnownAssignedXids array until we see the running-xacts record.


To reproduce this, I did this in the master:

postgres=# CREATE FUNCTION insertfunc(n integer) RETURNS VOID AS $$
declare
  i integer;
begin
  FOR i IN 1..n LOOP
BEGIN
  INSERT INTO foo VALUES (1);
EXCEPTION WHEN division_by_zero THEN RAISE NOTICE 'divbyzero';
END;
  END LOOP;
end;
$$ LANGUAGE plpgsql;
postgres=# SELECT insertfunc(1);

After letting that run for a while, so that a couple of checkpoints have 
occurred, kill the master and start standby to recover that from 
archive. After it has replayed all the WAL, stop the standby and restart it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Heikki Linnakangas

On 24.11.2010 12:48, Heikki Linnakangas wrote:

On 24.11.2010 06:56, Joachim Wieland wrote:

On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

On 19.11.2010 23:46, Joachim Wieland wrote:


FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with
wal_debug=on ? That will print all the replayed WAL records, which is
a lot
of data, but it might give a hint what's going on.


Sure, but this gives me only one more line:

[...]
LOG: redo starts at 1F8/FC00E978
LOG: REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL: too many KnownAssignedXids
CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG: startup process (PID 4587) exited with exit code 1
LOG: terminating any other active server processes


Thanks, I can reproduce this now. This happens when you have a wide gap
between the oldest still active xid and the latest xid.

When recovery starts, we fetch the oldestActiveXid from the checkpoint
record. Let's say that it's 100. We then start replaying WAL records
from the Redo pointer, and the first record (heap insert in your case)
contains an Xid that's much larger than 100, say 1. We call
RecordKnownAssignedXids() to make note that all xids between that range
are in-progress, but there isn't enough room in the array for that.

We normally get away with a smallish array because the array is trimmed
at commit and abort records, and the special xid-assignment record to
handle the case of a lot of subtransactions. We initialize the array
from the running-xacts record that's written at a checkpoint. That
mechanism fails in this case because the heap insert record is seen
before the running-xacts record, causing all those xids in the range
100-1 to be considered in-progress. The running-xacts record that
comes later would prune them, but we don't have enough slots to hold
them until that.

Hmm. I'm not sure off the top of my head how to fix that. Perhaps stash
the xids we see during WAL replay in private memory instead of putting
them in the KnownAssignedXids array until we see the running-xacts record.


Looking closer at RecordKnownAssignedTransactionIds(), there's a related 
much more serious bug there too. When latestObservedXid is initialized 
to the oldest still-running xid, oldestActiveXid, at WAL recovery, we 
zero the CLOG starting from the oldestActiveXid. That will zap away the 
clog bits of any old transactions that had already committed before the 
checkpoint started, but were younger than the oldest still running 
transaction. The transactions will be lost :-(.


It's dangerous to initialize latestObservedXid to anything to an older 
value. The idea of keeping the seen xids in a temporary list private to 
the startup process until the running-xacts record would solve that 
problem too. ProcArrayInitRecoveryInfo() would not be needed anymore, 
the KnownAssignedXids tracking would start at the first running-xacts 
record (or shutdown checkpoint) we see, not any sooner than that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Heikki Linnakangas

On 24.11.2010 13:38, Heikki Linnakangas wrote:

It's dangerous to initialize latestObservedXid to anything to an older
value.


older value than the nextXid-1 from the checkpoint record, I meant to say.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-24 Thread Simon Riggs
On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
 When recovery starts, we fetch the oldestActiveXid from the checkpoint
 record. Let's say that it's 100. We then start replaying WAL records 
 from the Redo pointer, and the first record (heap insert in your case)
 contains an Xid that's much larger than 100, say 1. We call 
 RecordKnownAssignedXids() to make note that all xids between that
 range are in-progress, but there isn't enough room in the array for
 that.

Agreed.

 Hmm. I'm not sure off the top of my head how to fix that. Perhaps
stash 
 the xids we see during WAL replay in private memory instead of
 putting 
 them in the KnownAssignedXids array until we see the running-xacts
 record.

Moving LogStandbySnapshot() earlier will help but won't solve it fully.

Will think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-23 Thread Joachim Wieland
On Tue, Nov 23, 2010 at 8:45 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 19.11.2010 23:46, Joachim Wieland wrote:

 FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
 pArray-maxKnownAssignedXids: 6890

 Hmm, that's a lot of entries in KnownAssignedXids.

 Can you recompile with WAL_DEBUG, and run the recovery again with
 wal_debug=on ? That will print all the replayed WAL records, which is a lot
 of data, but it might give a hint what's going on.

Sure, but this gives me only one more line:

[...]
LOG:  redo starts at 1F8/FC00E978
LOG:  REDO @ 1F8/FC00E978; LSN 1F8/FC00EE90: prev 1F8/FC00E930; xid
385669; len 21; bkpb1: Heap - insert: rel 1663/16384/18373; tid
3829898/23
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 4587) exited with exit code 1
LOG:  terminating any other active server processes


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-22 Thread Joachim Wieland
On Sun, Nov 21, 2010 at 11:48 PM, Fujii Masao masao.fu...@gmail.com wrote:
 --
 If you suspect a bug in Hot Standby, please set
        trace_recovery_messages = DEBUG2
 in postgresql.conf and repeat the action

 Always useful to know
 * max_connections
 * current number of sessions
 * whether we have two phase commits happening
 --

The trace_recovery_messages parameter does not give more output...

max_connections is set to 100

there have been no sessions on the standby itself, but a few on the
primary database, I don't know how much but probably not more than 10.
The sessions there were doing quite some load however, among them
slony synchronization, since the hot standby master database was
actually a slony replica.

max_prepared_transactions has not been changed from its default value of 0.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-22 Thread Heikki Linnakangas

On 19.11.2010 23:46, Joachim Wieland wrote:

FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


Hmm, that's a lot of entries in KnownAssignedXids.

Can you recompile with WAL_DEBUG, and run the recovery again with 
wal_debug=on ? That will print all the replayed WAL records, which is a 
lot of data, but it might give a hint what's going on.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-21 Thread Fujii Masao
On Sat, Nov 20, 2010 at 6:46 AM, Joachim Wieland j...@mcknight.de wrote:
 I still have the server, if you want me to debug anything or send a
 patch against 9.0.1 that gives more output, just let me know.

Per previous Simon's comment, the following information would be useful.
http://archives.postgresql.org/pgsql-general/2010-10/msg00154.php

--
If you suspect a bug in Hot Standby, please set
trace_recovery_messages = DEBUG2
in postgresql.conf and repeat the action

Always useful to know
* max_connections
* current number of sessions
* whether we have two phase commits happening
--

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby b-tree delete records review

2010-11-09 Thread Simon Riggs
On Tue, 2010-11-09 at 13:34 +0200, Heikki Linnakangas wrote:
 (cleaning up my inbox, and bumped into this..)
 
 On 22.04.2010 12:31, Simon Riggs wrote:
  On Thu, 2010-04-22 at 12:18 +0300, Heikki Linnakangas wrote:
  Simon Riggs wrote:
  On Thu, 2010-04-22 at 11:56 +0300, Heikki Linnakangas wrote:
 
  If none of the removed heap tuples were present anymore, we currently
  return InvalidTransactionId, which kills/waits out all read-only
  queries. But if none of the tuples were present anymore, the 
  read-only
  queries wouldn't have seen them anyway, so ISTM that we should treat
  InvalidTransactionId return value as we don't need to kill anyone.
  That's not the point. The tuples were not themselves the sole focus,
  Yes, they were. We're replaying a b-tree deletion record, which removes
  pointers to some heap tuples, making them unreachable to any read-only
  queries. If any of them still need to be visible to read-only queries,
  we have a conflict. But if all of the heap tuples are gone already,
  removing the index pointers to them can'ẗ change the situation for any
  query. If any of them should've been visible to a query, the damage was
  done already by whoever pruned the heap tuples leaving just the
  tombstone LP_DEAD item pointers (in the heap) behind.
  You're missing my point. Those tuples are indicators of what may lie
  elsewhere in the database, completely unreferenced by this WAL record.
  Just because these referenced tuples are gone doesn't imply that all
  tuple versions written by the as yet-unknown-xids are also gone. We
  can't infer anything about the whole database just from one small group
  of records.
  Have you got an example of that?
 
  I don't need one, I have suggested the safe route. In order to infer
  anything, and thereby further optimise things, we would need proof that
  no cases can exist, which I don't have. Perhaps we can add yet, not
  sure about that either.
 
  It's good to be safe rather than sorry, but I'd still like to know
  because I'm quite surprised by that, and got me worried that I don't
  understand how hot standby works as well as I thought I did. I thought
  the point of stopping replay/killing queries at a b-tree deletion record
  is precisely that it makes some heap tuples invisible to running
  read-only queries. If it doesn't make any tuples invisible, why do any
  queries need to be killed? And why was it OK for them to be running just
  before replaying the b-tree deletion record?
 
  I'm sorry but I'm too busy to talk further on this today. Since we are
  discussing a further optimisation rather than a bug, I hope it is OK to
  come back to this again later.
 
 Would now be a good time to revisit this? I still don't see why a b-tree 
 deletion record should conflict with anything, if all the removed index 
 tuples point to just LP_DEAD tombstones in the heap.

I want what you say to be true. The question is: is it? We just need to
explain why that will never be a problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   9   10   >